CHAPTER SEVEN MEASUREMENT AND SCALING LEARNING OBJECTIVES (PPT slide 7-2) 1. Understand the role of measurement in marketing research. 2. Explain the four basic levels of scales. 3. Describe scale development and its importance in gathering primary data. 4. Discuss comparative and noncomparative scales. KEY TERMS AND CONCEPTS 1. Behavioral intention scale 2. Comparative rating scale 3. Constant-sum scales 4. Construct 5. Construct development 6. Discriminatory power 7. Graphic rating scales 8. Interval scale 9. Likert scale 10. Measurement 11. Multiple-item scale 12. Nominal scale 13. Noncomparative rating scale 14. Ordinal scale 15. Rank-order scales 16. Ratio scale 17. Scale measurement 18. Scale points 19. Semantic differential scale 20. Single-item scale CHAPTER SUMMARY BY LEARNING OBJECTIVES Understand the role of measurement in marketing research. Measurement is the process of developing methods to systematically characterize or quantify information about persons, events, ideas, or objects of interest. As part of the measurement process, researchers assign either numbers or labels to phenomena they measure. The measurement process consists of two tasks: construct selection/development and scale measurement. A constructs is an unobservable concept that is measured indirectly by a group of related variables. Thus, constructs are made up of a combination of several related indicator variables that together define the concept being measured. Construct development is the process in which researchers identify characteristics that define the concept being studied by the researcher. When developing constructs, researchers must consider the abstractness of the construct and its dimensionality, as well as reliability and validity. Once the characteristics are identified, the researcher must then develop a method of indirectly measuring the concept. Scale measurement is the process of assigning a set of descriptors to represent the range of possible responses a person may give in answering a question about a particular object or construct. Explain the four basic levels of scales. The four basic levels of scales are nominal, ordinal, interval, and ratio. Nominal scales are the most basic and provide the least amount of data. They assign labels to objects and respondents but do not show relative magnitudes between them. Nominal scales ask respondents about their religious affiliation, gender, type of dwelling, occupation, last brand of cereal purchased, and so on. To analyze nominal data researchers use modes and frequency distributions. Ordinal scales require respondents to express relative magnitude about a topic. Ordinal scales enable researchers to create a hierarchical pattern among the responses (or scale points) that indicate “greater than/less than” relationships. Data derived from ordinal scale measurements include medians and ranges as well as modes and frequency distributions. An example of an ordinal scale would be “complete knowledge,” “good knowledge,” “basic knowledge,” “little knowledge,” and “no knowledge.” Ordinal scales determine relative position, but they cannot determine how much more or how much less since they do not measure absolute differences. Interval scales enable researchers to show absolute differences between scale points. With interval data, means and standard deviations can be calculated, as well as the mode, median, frequency distribution, and range. Ratio scales enable researchers to identify absolute differences between each scale point and to make absolute comparisons between the respondents’ responses. Ratio questions are designed to allow “true natural zero” or “true state of nothing” responses. Ratio scales also can develop means, standard deviations, and other measures of central tendency and variation. Describe scale development and its importance in gathering primary data. There are three important components to scale measurement: (1) question/setup; (2) dimensions of the object, construct, or behavior; and (3) the scale point descriptors. Some of the criteria for scale development are the intelligibility of the questions, the appropriateness of the primary descriptors, and the discriminatory power of the scale descriptors. Likert scales use agree/disagree scale descriptors to obtain a person’s attitude toward a given object or behavior. Semantic differential scale formats are used to obtain perceptual image profiles of an object or behavior. This scale format is unique in that it uses a set of bipolar scales to measure several different attributes of a given object or behavior. Behavioral intention scales measure the likelihood that people will purchase an object or service, or visit a store. Scale point descriptors such as “definitely would,” “probably would,” “probably would not,” and “definitely would not” are often used with intentions scales. Discuss comparative and noncomparative scales. Comparative scales require the respondent to make a direct comparison between two products or services, whereas noncomparative scales rate products or services independently. Data from comparative scales is interpreted in relative terms. Both types of scales are generally considered interval or ratio and more advanced statistical procedures can be used with them. One benefit of comparative scales is they enable researchers to identify small differences between attributes, constructs, or objects. In addition, comparative scales require fewer theoretical assumptions and are easier for respondents to understand and respond to than are many noncomparative scales. CHAPTER OUTLINE Opening VIGNETTE: Santa Fe Grill Mexican Restaurant: Predicting Customer Loyalty The opening vignette in this chapter describes the problems facing Santa Fe Grill Mexican Restaurant as the owners seek to better understand the factors leading to customer loyalty. That is, what would motivate customers to return to their restaurant more often? To gain a better understanding of customer loyalty, the Santa Fe Grill owners contacted Burke’s (www.burke.com) Customer Satisfaction Division. They evaluated several alternatives including measuring customer loyalty, intention to recommend and return to the restaurant, and sales. Several insights about the importance of construct and measurement developments can be gained from the Santa Fe Grill experience. First, not knowing the critical elements that influence customers’ restaurant loyalty can lead to intuitive guesswork and unreliable sales predictions. Second, developing loyal customers requires identifying and precisely defining constructs that predict loyalty (i.e., customer attitudes, emotions, behavioral factors). I. Value of Measurement in Information Research (PPT slides 7-3) Measurement is an integral part of the modern world, yet the beginnings of measurement lie in the distant past. Because accurate measurement is essential to effective decision making it is important to understand the process of measuring consumer’s attitudes, behaviors, and other marketplace phenomena. II. Overview of the Measurement Process (PPT slide 7-4) Measurement is the integrative process of determining the intensity (or amount) of information about constructs, concepts, or objects. As part of the measurement process, researchers assign either numbers or labels to phenomena they measure. The measurement process consists of two tasks: Construct selection/development—the goal is to precisely identify and define what is to be measured. Scale measurement—determines how to precisely measure each construct. III. What is a Construct? (PPT slides 7-5) A construct is an abstract idea or concept formed in a person’s mind. This idea is a combination of a number of similar characteristics of the construct. The characteristics are the variables that collectively define the concept and make measurement of the concept possible. A. Construct Development (PPT slides 7-6) Marketing constructs must be clearly defined. A construct is an unobservable concept that is measured indirectly by a group of related variables. Thus, constructs are made up of a combination of several related indicator variables that together define the concept being measured. Each individual indicator has a scale measurement. The construct being studied is indirectly measured by obtaining scale measurements on each of the indicators and adding them together to get an overall score for the construct. Construct development is an integrative process in which researchers determine what specific data should be collected for solving the defined research problem. The process begins with an accurate definition of the purpose of the study and the research problem. Without a clear initial understanding of the research problem, the researcher is likely to collect irrelevant or inaccurate data, thereby wasting a great deal of time, effort, and money. Once the characteristics are identified, the researcher must then develop a method of indirectly measuring the concept. At the heart of construct development is the need to determine exactly what is to be measured. Objects that are relevant are identified first. Then the objective and subjective properties of each object are specified. When data are needed only about a concrete issue, the research focus is limited to measuring the object’s objective properties. But when data are needed to understand an object’s subjective (abstract) properties, the researcher must identify measurable subcomponents that can be used as indicators of the object’s subjective properties. Exhibit 7.1 shows examples of objects and their concrete and abstract properties (PPT slide 6-7 and 6-8). A rule of thumb is that if an object’s features can be directly measured using physical characteristics, then that feature is a concrete variable and not an abstract construct. Abstract constructs are not physical characteristics and are measured indirectly. IV. Scale Measurement (PPT slides 7-9) The quality of responses associated with any question or observation technique depends directly on the scale measurements used by the researcher. Scale measurement involves the process of assigning descriptors to represent the range of possible responses to a question about a particular object or construct. The scale descriptors are a combination of labels, such as “Strongly Agree” or “Strongly Disagree” and numbers, such as 1–7, that are assigned using a set of rules. Scale points reflect designated degrees of intensity assigned to the responses in a given questioning or observation method. All scale measurements can be classified as one of four basic scale levels (PPT slides 7-10): Nominal Ordinal Interval Ratio A. Nominal Scales (PPT slides 7-10 and 7-11) A nominal scale is a type of scale in which the questions require respondents to provide only some type of descriptor as the raw response. It is the most basic and least powerful scale design. Responses do not contain a level of intensity. Thus, a ranking of the set of responses is not possible. Nominal scales allow the researcher only to categorize the responses into mutually exclusive subsets that do not have distances between them. Thus, the only possible mathematical calculation is to count the number of responses in each category and to report the mode. Exhibit 7.2 gives some examples of nominal scales (PPT slide 7-11). B. Ordinal Scales (PPT slides 7-10 and 7-12) An ordinal scale is a scale that allows a respondent to express relative magnitude between the answers to a question. Ordinal scales are more powerful than nominal scales. The responses obtained can be rank-ordered in a hierarchical pattern. Thus, relationships between responses can be determined such as “greater than/less than,” “higher than/lower than,” “more often/less often,” “more important/less important,” or “more favorable/less favorable.” The mathematical calculations that can be applied with ordinal scales include: Mode Median Frequency distributions Ranges Ordinal scales cannot be used to determine the absolute difference between rankings. Exhibit 7.3 provides several examples of ordinal scales (PPT slide 7-12). C. Interval Scales (PPT slides 7-10 and 7-13) An interval scale is a scale that demonstrates absolute differences between each scale point. The intervals between the scale numbers tell the researchers how far apart the measured objects are on a particular attribute. In addition to the mode and median, the mean and standard deviation of the respondents’ answers can be calculated for interval scales. This means that researchers can report findings not only about hierarchical differences (better than or worse than), but also the absolute differences between the data. Exhibit 7.4 gives several examples of interval scales (PPT slide 7-13). D. Ratio Scales (PPT slides 7-10 and 7-14) A ratio scale is a scale that allows the researcher not only to identify the absolute differences between each scale point but also to make comparisons between the responses. Ratio scales are designed to enable a “true natural zero” or “true state of nothing” response to be a valid response to a question. Generally, ratio scales ask respondents to provide a specific numerical value as their response, regardless of whether or not a set of scale points is used. In addition to the mode, median, mean, and standard deviation, one can make comparisons between levels. Exhibit 7.5 provides examples of ratio scales (PPT slide 7-14). V. Evaluating Measurement Scales (PPT slides 7-15 and 7-16) All measurement scales should be evaluated for reliability and validity. A. Scale Reliability (PPT slides 7-15) Scale reliability refers to the extent to which a scale can reproduce the same or similar measurement results in repeated trials. Thus, reliability is a measure of consistency in measurement. Random error produces inconsistency in scale measurements that leads to lower scale reliability. But researchers can improve reliability by carefully designing scaled questions. Two of the techniques that help researchers assess the reliability of scales are: Test-retest Equivalent form Test-retest technique involves repeating the scale measurement with either the same sample of respondents at two different times or two different samples of respondents from the same defined target population under as nearly the same conditions as possible. The idea behind this approach is that if random variations are present, they will be revealed by variations in the scores between the two sampled measurements. If there are very few differences between the first and second administrations of the scale, the measuring scale is viewed as being stable and therefore reliable. Some of the potential problems associated with the test-retest approach are listed below: Respondents who completed the scale the first time might be absent for the second administration of the scale. Respondents might become sensitive to the scale measurement and therefore alter their responses in the second measurement. Environmental or personal factors may change between the two administrations, thus causing changes in the responses in the second measurement. Some researchers believe the problems associated with test-retest reliability technique can be avoided by using the equivalent form technique. In this technique, researchers create two similar yet different scale measurements for the given construct and administer both forms to either the same sample of respondents or to two samples of respondents from the same defined target population. There are two potential drawbacks with the equivalent form reliability technique: Even if equivalent versions of the scale can be developed, it might not be worth the time, effort, and expense of determining that two similar yet different scales can be used to measure the same construct. It is difficult and perhaps impossible to create two totally equivalent scales. Thus, questions may be raised as to which scale is the most appropriate to use in measuring teaching effectiveness. The previous approaches to examining reliability are often difficult to complete in a timely and accurate manner. As a result, marketing researchers most often use internal consistency reliability. Internal consistency is the degree to which the individual questions of a construct are correlated. Two popular techniques are used to assess internal consistency: Split-half test—the scale questions are divided into two halves and the resulting halves’ scores are correlated against one another. High correlations between the resulting halves indicate good (or acceptable) internal consistency. Coefficient alpha—calculates the average of all possible split-half measures that result from different ways of dividing the scale questions. The coefficient value can range from 0 to 1, and, in most cases, a value of less than 0.7 would typically indicate marginal to low (unsatisfactory) internal consistency. B. Validity (PPT slides 7-16) Since reliable scales are not necessarily valid, researchers also need to be concerned about validity. Scale validity assesses whether a scale measures what it is supposed to measure. It is a measure of accuracy in measurement. A construct with perfect validity contains no measurement error. An easy measure of validity would be to compare observed measurements with the true measurement. The problem is that we very seldom know the true measure. Validation, in general, involves determining the suitability of the questions (statements) chosen to represent the construct. Several approaches to assess scale validity are listed below: Face validity—is based on the researcher’s intuitive evaluation of whether the statements look like they measure what they are supposed to measure. Establishing the face validity of a scale involves a systematic but subjective assessment of a scale’s ability to measure what it is supposed to measure. Thus, researchers use their expert judgment to determine face validity. Content validity—is a measure of the extent to which a construct represents all the relevant dimensions. Content validity requires more rigorous statistical assessment than face validity, which only requires intuitive judgments. Content validity is assessed before data is collected in an effort to ensure the construct (scale) includes items to represent all relevant areas. It is generally carried out in the process of developing or revising scales. In contrast, face validity is a post hoc claim about existing scales that the items represent the construct being measured. Several other types of validity typically are examined after data is collected, particularly when multi-item scales are being used: Convergent validity—is evaluated with multi-item scales and represents a situation in which the multiple items measuring the same construct share a high proportion of variance, typically more than 50 percent. Discriminant validity—is the extent to which a single construct differs from other constructs and represents a unique construct. Two approaches typically are used to obtain data to assess validity: If sufficient resources are available, a pilot study is conducted with 100 to 200 respondents believed to be representative of the defined target population. When fewer resources are available, researchers assess only content validity using a panel of experts. VI. Developing Scale Measurements (PPT slides 7-17 and 18) Designing measurement scales requires: Understanding the defined research problem Establishing detailed data requirements Identifying and developing constructs Selecting the appropriate measurement scale A. Criteria for Scale Development (PPT slide 7-17) Questions must be phrased carefully to produce accurate data. To do so, the researcher must develop appropriate scale descriptors to be used as the scale points: Understanding of the questions: The researcher must consider the intellectual capacity and language ability of individuals who will be asked to respond to the scales. Researchers should not automatically assume that respondents understand the questions and response choices. Appropriate language must be used in both the questions and the answers. Simplicity in word choice and straightforward, simple sentence construction improve understanding. All scaled questions should be pretested to evaluate their level of understanding. Discriminatory power of scale descriptors: The discriminatory power of scale descriptors is the scale’s ability to differentiate between the scale responses. Researchers must decide how many scale points are necessary to represent the relative magnitudes of a response scale. The more scale points, the greater the discriminatory power of the scale. There is no absolute rule about the number of scale points that should be used in creating a scale. Balanced versus unbalanced scales: Researchers must consider whether to use a balanced or unbalanced scale. A balanced scale has an equal number of positive (favorable) and negative (unfavorable) response alternatives. An unbalanced scale has a larger number of response options on one side, either positive or negative. For most research situations a balanced scale is recommended because unbalanced scales often introduce bias. One exception is when the attitudes of respondents are likely to be predominantly one-sided, either positive or negative. When this situation is expected, researchers typically use an unbalanced scale. Forced or nonforced choice scales: A scale that does not have a neutral descriptor to divide the positive and negative answers is referred to as a forced-choice scale. It is forced because the respondent can only select either a positive or a negative answer, and not a neutral one. In contrast, a scale that includes a center neutral response is referred to as a nonforced or free-choice scale. Exhibit 7.6 presents several different examples of both “even-point, forced-choice” and “odd-point, nonforced” scales. Some researchers believe scales should be designed as “odd-point, nonforced” scales since not all respondents will have enough knowledge or experience with the topic to be able to accurately assess their thoughts or feelings. If respondents are forced to choose, the scale may produce lower-quality data. With nonforced choice scales, however, the so-called neutral scale point provides respondents an easy way to express their feelings. Many researchers believe that there is no such thing as a neutral attitude or feeling, that these mental aspects almost always have some degree of a positive or negative orientation attached to them. A person either has an attitude or does not have an attitude about a given object. Likewise, a person will either have a feeling or not have a feeling. An alternative approach to handling situations in which respondents may feel uncomfortable about expressing their thoughts or feelings because they have no knowledge of or experience with it would be to incorporate a “Not Applicable” response choice. Negatively worded statements: Scale development guidelines traditionally suggested that negatively worded statements should be included to verify that respondents are reading the questions. However, negatively worded statements almost always create problems for respondents in data collection. As a result, inclusion of negatively worded statements should be minimized and even then approached with caution. Desired measures of central tendency and dispersion (PPT slides 7-18): The type of statistical analyses that can be performed on data depends on the level of the data collected, whether nominal, ordinal, interval, or ratio. Measures of central tendency locate the center of a distribution of responses and are basic summary statistics. The mean, median, and mode measure central tendency using different criteria. Mean—the arithmetic average of all the raw data responses. Median—the sample statistic that divides the data so that half the data are above the statistic value and half are below. Mode—the value most frequently given among all of the respondents. Measures of dispersions describe how the data are dispersed around a central value. These statistics enable the researcher to report the variability of responses on a particular scale. Measures of dispersion include: Frequency distribution—summary of how many times each possible response to a scale question/setup was recorded by the total group of respondents. This distribution can be easily converted into percentages or histograms. Range— represents the distance between the largest and smallest response. Standard deviation—statistical value that specifies the degree of variation in the responses. Given the important role these statistics play in data analysis, an understanding of how different levels of scales influence the use of a particular statistic is critical in scale design. Exhibit 7.7 displays these relationships (PPT slide 7-18): Nominal scales can only be analyzed using frequency distributions and the mode. Ordinal scales can be analyzed using medians and ranges as well as modes and frequency distributions. For interval or ratio scales, the most appropriate statistics to use are means and standard deviations. In addition, interval and ratio data can be analyzed using modes, medians, frequency distributions, and ranges. B. Adapting Established Scales There are literally hundreds of previously published scales in marketing. The most relevant sources of these scales are: William Bearden, Richard Netemeyer, and Kelly Haws, Handbook of Marketing Scales, 3rd ed., Sage Publications, 2011 Gordon Bruner, Marketing Scales Handbook, 3rd ed., Chicago, IL, American Marketing Association, 2006 The online Measures Toolchest by the Academy of Management, available at http://measures.kammeyer-uf.com/wiki/Main_Page. Some of the scales described in these sources can be used in their published form to collect data. But most scales need to be adapted to meet current psychometric standards. VII. Scales to Measure Attitudes and Behaviors (PPT slides 7-19 to 7-22) Scales are the “rulers” that measure customer attitudes, behaviors, and intentions. Well-designed scales result in better measurement of marketplace phenomena, and thus provide more accurate information to marketing decision makers. Several types of scales have proven useful in many different situations. This section discusses three scale formats: Likert scales Semantic differential scales Behavioral intention scales Exhibit 7.8 shows the general construct development/scale measurement process (PPT slide 7-22). A. Likert Scale (PPT slides 7-19) A Likert scale is an ordinal scale format that asks respondents to indicate the extent to which they agree or disagree with a series of mental belief or behavioral belief statements about a given object. Named after its original developer, Rensis Likert, this scale initially had five scale descriptors: Strongly agree Agree Neither agree nor disagree Disagree Strongly disagree The Likert scale is often expanded beyond the original 5-point format to a 7-point scale, and most researchers treat the scale format as an interval scale. Likert scales are best for research designs that use self- administered surveys, personal interviews, or online surveys. Exhibit 7.9 provides an example of a 6-point Likert scale in a self-administered survey. B. Semantic Differential Scale (PPT slides 7-20) A semantic differential scale is a unique bipolar ordinal scale format that captures a person’s attitudes or feelings about a given object (PPT slide 7-20). Only the endpoints of the scale are labeled. In most cases, semantic differential scales use either 5 or 7 scale points. Means for each attribute can be calculated and mapped on a diagram with the various attributes listed, creating a “perceptual image profile” of the object. Semantic differential scales can be used to develop and compare profiles of different companies, brands, or products. Respondents can also be asked to indicate how an ideal product would rate, and then researchers can compare ideal and actual products. A problem encountered in designing semantic differential scales is the inappropriate narrative expressions of the scale descriptors. In a well-designed semantic differential scale, the individual scales should be truly bipolar. Sometimes researchers use a negative pole descriptor that is not truly an opposite of the positive descriptor. This creates a scale that is difficult for the respondent to interpret correctly. Researchers must be careful when selecting bipolar descriptors to make sure the words or phrases are truly extreme bipolar in nature and allow for creating symmetrical scales. Exhibits 7.10 and 7.11 provide examples of semantic differential scales. B. Behavioral Intention Scale (PPT slides 7-21) A behavioral intention scale is a special type of rating scale designed to capture the likelihood that people will demonstrate some type of predictable behavior intent toward purchasing an object or service in a future time frame (PPT slide 7-21). Behavioral intention scales are easy to construct. Consumers are asked to make a subjective judgment of their likelihood of buying a product or service, or taking a specific action. When designing a behavioral intention scale, a specific time frame should be included in the instructions to the respondent. Without an expressed time frame, it is likely respondents will bias their response toward the “definitely would” or “probably would” scale categories. Behavioral intentions are often a key variable of interest in marketing research studies. To make scale points more specific, researchers can use descriptors that indicate the percentage chance they will buy a product, or engage in a behavior of interest. The following set of scale points could be used: Definitely will (90 to 100 percent chance) Probably will (50 to 89 percent chance) Probably will not (10 to 49 percent chance) Definitely will not (less than 10 percent chance) Exhibit 7.12 shows what a shopping intention scale might look like. No matter what kind of scale is used to capture people’s attitudes and behaviors, there often is no one best or guaranteed approach. While there are established scale measures for obtaining the components that make up respondents’ attitudes and behavioral intentions, the data provided from these scale measurements should not be interpreted as being completely predictive of behavior. Unfortunately, knowledge of an individual’s attitudes may not predict actual behavior. Intentions are better than attitudes at predicting behavior, but the strongest predictor of future behavior is past behavior. VIII. Comparative and Noncomparative Rating Scales (PPT slide 7-23 to 7-25) A noncomparative rating scale is used when the objective is to have a respondent express his or her attitudes, behavior, or intentions about a specific object. In contrast, a comparative rating scale is used when the objective is to have a respondent express his or her attitudes, feelings, or behaviors about an object or its attributes on the basis of some other object or its attributes. Exhibit 7.13 gives several examples of graphic rating scale formats, which are among the most widely used noncomparative scales. Graphic rating scales use a scaling descriptor format that presents a respondent with a continuous line as the set of possible responses to a question. For example, the first graphic rating scale displayed in Exhibit 7.13 is used in situations where the researcher wants to collect “usage behavior” data about an object. Another popular type of graphic rating scale descriptor design utilizes smiling faces. The smiling faces are arranged in order and depict a continuous range from “very happy” to “very sad” without providing narrative descriptors of the two extreme positions. This visual graphic rating design can be used to collect a variety of attitudinal and emotional data. Exhibit 7.14 illustrates rank-order and constant-sums scale formats. A common characteristic of comparative scales is that they can be used to identify and directly compare similarities and differences between products or services, brands, or product attributes. Rank-order and constant-sums scale formats are among the most commonly used comparative scales. Rank-order scales use a format that enables respondents to compare objects by indicating their order of preference or choice from first to last. Rank-order scales are easy to use as long as respondents are not asked to rank too many items. Use of rank-order scales in traditional or computer-assisted telephone interviews may be difficult, but it is possible as long as the number of items being compared is kept to four or five. When respondents are asked to rank objects or attributes of objects, problems can occur if the respondent’s preferred objects or attributes are not listed. Another limitation is that only ordinal data can be obtained using rank-order scales. Constant sum scales require the respondent to allocate a given number of points, usually 100, among each separate attribute or feature relative to all the other listed ones. The resulting values indicate the relative magnitude of importance each feature has to the respondent. This scaling format usually requires that the individual values must add up to 100. Exhibit 7.14 displays a constant-sum scale. IX. Other Scale Measurement Issues (PPT slide 7-26) Attention to scale measurement issues will increase the usefulness of research results. A. Single-Item and Multiple-Item (PPT slide 7-26) A single-item scale involves collecting data about only one attribute of the object or construct being investigated. One example of a single item scale would be age. The respondent is asked a single question about his or her age and supplies only one possible response to the question. In contrast, many marketing research projects that involve collecting attitudinal, emotional, and behavioral data use some type of multiple-item scale. A multiple-item scale simultaneously collects data on several attributes of an object or construct. It includes several statements relating to the object or construct being examined. Each statement has a rating scale attached to it, and the researcher often will sum the ratings on the individual statements to obtain a summated or overall rating for the object or construct. The decision to use a single-item versus a multiple-item scale is made when the construct is being developed. Two factors play a significant role in the process: The number of dimensions of the construct—the researcher must assess the various factors or dimensions that make up the construct under investigation. The reliability and validity—researchers must consider reliability and validity. In general, multiple-item scales are more reliable and more valid. In general, multiple-item scales are more reliable and more valid. Thus, multiple-item scales generally are preferred over single item scales. B. Clear Wording (PPT slides 7-26) When phrasing the question setup element of the scale use clear wording and avoid ambiguity. Also avoid using “leading” words or phrases in any scale measurement’s question. Regardless of the data collection method (personal, telephone, computer-assisted interviews, or online surveys), all necessary instructions for both respondent and interviewer are part of the scale measurement’s setup. All instructions should be kept simple and clear. When determining the appropriate set of scale point descriptors, make sure the descriptors are relevant to the type of data being sought. Scale descriptors should have adequate discriminatory power, be mutually exclusive, and make sense to the respondent. Use only scale descriptors and formats that have been pretested and evaluated for scale reliability and validity. Exhibit 7.15 provides a summary checklist for evaluating the appropriateness of scale designs. The guidelines are also useful in developing and evaluating questions to be used on questionnaires. MARKETING RESEARCH IN ACTION WHAT CAN YOU LEARN FROM A CUSTOMER LOYALTY INDEX? (PPT slides 7-27 and 7-30) The Marketing Research in Action highlights the point customer loyalty is a composite of number of qualities: The intention to buy again and/or buy additional products or services from the same company A willingness to recommend the company to others A commitment to the company demonstrated by a resistance to switching to a competitor Customer behaviors that reflect loyalty include: repeat purchasing of products or services. purchasing more and different products or services from the same company. recommending the company to others. Burke, Inc. (burke.com) developed a Secure Customer Index® (SCI®) using the combined scores on three components of customer loyalty. They ask, for example: “Overall, how satisfied were you with your visit to this restaurant?” To examine their likelihood to recommend: “How likely would you be to recommend this restaurant to a friend or associate?” And finally, to examine likelihood of repeat purchases, they ask “How likely are you to choose to visit this restaurant again?” With these three components and the appropriate scales for each, “secure customers” are defined as those giving the most positive responses across all three components. Companies are increasingly able to link customer satisfaction and customer loyalty to bottom-line benefits. By examining customer behaviors over time and comparing them to SCI® scores, a strong connection can be shown between secure customers and repeat purchasing of products or services. Using a customer loyalty index helps companies better understand their customers. Instructor Manual for Essentials of Marketing Research Joseph F. Hair, Mary Celsi, Robert P. Bush, David J. Ortinau 9780078028816, 9780078112119
Close