## Preview Extract

CHAPTER TWELVE EXAMINING RELATIONSHIPS IN QUANTITATIVE RESEARCH LEARNING OBJECTIVES (PPT slides 12-2 and 12-3) 1. Understand and evaluate the types of relationships between variables. 2. Explain the concepts of association and co-variation. 3. Discuss the differences between Pearson correlation and Spearman correlation. 4. Explain the concept of statistical significance versus practical significance. 5. Understand when and how to use regression analysis. KEY TERMS AND CONCEPTS 1. Beta coefficient 2. Bivariate regression analysis 3. Coefficient of determination n (r2) 4. Covariation 5. Curvilinear relationship 6. Homoskedasticity 7. Heteroskedasticity 8. Least squares procedure 9. Linear relationship 10. Model F statistic 11. Multicollinearity 12. Multiple regression analysis 13. Normal curve 14. Ordinary least squares 15. Pearson correlation coefficient 16. Regression coefficient 17. Scatter diagram 18. Spearman rank order correlation coefficient 19. Unexplained variance CHAPTER SUMMARY BY LEARNING OBJECTIVES Understand and evaluate the types of relationships between variables. Relationships between variables can be described in several ways, including presence, direction, strength of association, and type. Presence tells us whether a consistent and systematic relationship exists. Direction tells us whether the relationship is positive or negative. Strength of association tells us whether we have a weak or strong relationship, and the type of relationship is usually described as either linear or nonlinear. Two variables may share a linear relationship, in which changes in one variable are accompanied by some change (not necessarily the same amount of change) in the other variable. As long as the amount of change stays constant over the range of both variables, the relationship is termed linear. Relationships between two variables that change in strength and/or direction as the values of the variables change are referred to as curvilinear. Explain the concepts of association and covariation. The terms covariation and association refer to the attempt to quantify the strength of the relationship between two variables. Covariation is the amount of change in one variable of interest that is consistently related to change in another variable under study. The degree of association is a numerical measure of the strength of the relationship between two variables. Both these terms refer to linear relationships. Discuss the differences between Pearson correlation and Spearman correlation. Pearson correlation coefficients are a measure of linear association between two variables of interest. The Pearson correlation coefficient is used when both variables are measured on an interval or ratio scale. When one or more variables of interest are measured on an ordinal scale, the Spearman rank order correlation coefficient should be used. Explain the concept of statistical significance versus practical significance. Because some of the procedures involved in determining the statistical significance of a statistical test include consideration of the sample size, it is possible to have a very low degree of association between two variables show up as statistically significant (i.e., the population parameter is not equal to zero). However, by considering the absolute strength of the relationship in addition to its statistical significance, the researcher is better able to draw the appropriate conclusion about the data and the population from which they were selected. Understand when and how to use regression analysis. Regression analysis is useful in answering questions about the strength of a linear relationship between a dependent variable and one or more independent variables. The results of a regression analysis indicate the amount of change in the dependent variable that is associated with a one-unit change in the independent variables. In addition, the accuracy of the regression equation can be evaluated by comparing the predicted values of the dependent variable to the actual values of the dependent variable drawn from the sample. When using regression the assumptions should be checked to ensure the results are accurate and not distorted by deviations from the assumptions. CHAPTER OUTLINE Opening VIGNETTE: Data Mining Helps Rebuild Procter & Gamble as a Global Powerhouse The opening vignette in this chapter describes how Procter & Gamble (P&G), a global player in consumer household products, used information technology and customer relationship management to design their brand building strategy. From internal employee surveys they recognized a need to recommit to a customer-centric approach. P&G mined the information in its data warehouse to retool customer models for its global brand and distributor markets. I. Examining Relationships between Variables (PPT 12-4 to 12-6) Relationships between variables can be described in several ways, including (PPT slides 12-4 to 12-6): Presence—if a systematic relationship exists between two or more variables, then a relationship is present. To measure whether a relationship exists, researchers rely on the concept of statistical significance. Direction—can be either positive or negative. Strength of association—researchers generally categorize the strength of association as no relationship, weak relationship, moderate relationship, or strong relationship. If a consistent and systematic relationship is not present, then there is no relationship. A weak association means the variables may have some variance in common, but not much. A moderate or strong association means there is a consistent and systematic relationship, and the relationship is much more evident when it is strong. The strength of association is determined by the size of the correlation coefficient, with larger coefficients indicating a stronger association. Type—if the researchers say two variables are related, then they pose this question: “What is the nature of the relationship?” There are a number of different ways in which two variables can share a relationship Linear relationship is an association between two variables whereby the strength and nature of the relationship remains the same over the range of both variables. Curvilinear relationship is a relationship between two variables whereby the strength and/or direction of their relationship changes over the range of both variables. A linear relationship is much simpler to work with than a curvilinear relationship. If the researchers know the value of variable X, then they can apply the formula for a straight line (Y = a + bX) to determine the value of Y. But when two variables have a curvilinear relationship, the formula that best describes the linkage is more complex. Therefore, most marketing researchers work with relationships they believe are linear. Marketers are often interested in describing the relationship between variables they think influence purchases of their product(s). There are four questions to ask about a possible relationship between two variables: “Is there a relationship between the two variables of interest?” If there is a relationship, “How strong is that relationship?”) “What is the direction of the relationship?” “Is the relationship linear or nonlinear?” Once these questions have been answered, the researcher can interpret results, make conclusions, and recommend managerial actions. II. Covariation and Variable Relationships (PPT slide 12-7) Covariation is the amount of change in one variable that is consistently related to the change in another variable of interest (PPT slide 12-7). Another way of stating the concept of covariation is that it is the degree of association between two variables. If two variables are found to change together on a reliable or consistent basis, then we can use that information to make predictions that will improve decision making about advertising and marketing strategies. One way of visually describing the covariation between two variables is with the use of a scatter diagram (PPT slide 12-7). A scatter diagram is a graphic plot of the relative position of two variables using a horizontal and a vertical axis to represent the values of the respective variables. Exhibits 12.1 through 12.4 show some examples of possible relationships between two variables that might show up on a scatter diagram: Exhibit 12.1 is a scatter diagram illustrating no relationship between the two variables, X and Y (PPT slide 12-8). Exhibit 12.2 is a scatter diagram illustrating a positive relationship between X and Y; if researchers know the relationship between Y and X is a linear, positive relationship, they would know the values of Y and X change in the same direction (PPT slide 12-9). Exhibit 12.3 is a scatter diagram illustrating a negative relationship between X and Y; increases in the value of Y are associated with decreases in the values of X (PPT slide 12-10). Exhibit 12.4 is a scatter diagram illustrating a curvilinear relationship between X and Y; the values of Y and the values of X are different for different values of the variables (PPT slide 12-11). III. Correlation Analysis (PPT slide 12-12) Scatter diagrams are a visual way to describe the relationship between two variables and the covariation they share. But even though a picture is worth a thousand words, it is often more convenient to use a quantitative measure of the covariation between two items. The Pearson correlation coefficient is a statistical measure of the strength of a linear relationship between two metric variables (PPT slide 12-12). It varies between –1.00 and 1.00, with 0 representing absolutely no association between two variables, and –1.00 or 1.00 representing a perfect link between two variables. The correlation coefficient can be either positive or negative depending upon the direction of the relationship between two variables. But the larger the correlation coefficient, the stronger the association between two variables. The null hypothesis for the Pearson correlation coefficient states that there is no association between the two variables and the correlation coefficient is zero. If the correlation coefficient is statistically significant, the null hypothesis is rejected, and the researchers can conclude with some confidence that the two variables they are examining do share some association in the population. The size of the correlation coefficient can be used to quantitatively describe the strength of the association between two variables. Exhibit 12.5 suggests some rules of thumb for characterizing the strength of the association between two variables based on the size of the correlation coefficient (PPT slide 12-13). A. Pearson Correlation Coefficient (PPT slide 12-14) In calculating the Pearson correlation coefficient, researchers make several assumptions (PPT slide 12-14). They assume that: the two variables have been measured using interval- or ratio-scaled measures. the relationship they are trying to measure is linear. the variables they want to analyze have a normally distributed population. B. SPSS Application—Pearson Correlation (PPT slide 12-15) The text uses the Santa Fe Grill restaurant database to examine the Pearson correlation. Exhibit 12.6 shows the SPSS Pearson correlation example for Santa Fe Grill customers (PPT slide 12-15). C. Substantive Significance of the Correlation Coefficient (PPT slide 12-16) When the correlation coefficient is strong and significant, the researchers can be confident that the two variables are associated in a linear fashion. When the correlation coefficient is weak, two possibilities must be considered: There is not a consistent, systematic relationship between the two variables. The association exists, but it is not linear, and other types of relationships must be investigated further. When researchers square the correlation coefficient, they arrive at the coefficient of determination (r2). It is a number measuring the proportion of variation in one variable accounted for by another. The r2 measure can be thought of as a percentage and varies from 0.0 to 1.00 (PPT slide 12-16). The larger the size of the coefficient of determination, the stronger the linear relationship between the two variables being examined. There is a difference between statistical significance and substantive significance. Thus, researchers need to understand substantive significance. D. Influence of Measurement Scales on Correlation Analysis (PPT slide 12-17) The Spearman rank order correlation coefficient is a statistical measure of the linear association between two variables where both have been measured using ordinal (rank order) scales (PPT slide 12-17). If either one of the variables is represented by rank order data, the best approach is to use the Spearman rank order correlation coefficient, rather than the Pearson correlation. E. SPSS Application—Spearman Rank Order Correlation (PPT slide 12-18 and 12-19) This section takes students through the steps necessary to conduct the Spearman rank order correlation using SPSS. The SPSS results for the Spearman correlation are shown in Exhibit 12.7 (PPT slide 12-18). The “SPSS Application—Calculating Median Rankings” section takes students through the steps necessary to calculate the median ranking using SPSS. The SPSS results for median rankings are shown in the Statistics table in Exhibit 12.8 (PPT slide 12-19). IV. What is Regression Analysis? (PPT slide 12-20 to 12-22) Correlation can determine if a relationship exists between two variables. The correlation coefficient also tells the researcher the overall strength of the association and direction of the relationship between the variables. However, managers sometimes still need to know how to describe the relationship between variables in greater detail. For example, a marketing manager may want to predict future sales or how a price increase will affect the profits or market share of the company. There are a number of ways to make such predictions: Extrapolation from past behavior of the variable Simple guesses Use of a regression equation that includes information about related variables to assist in the prediction Extrapolation and guesses (educated or otherwise) usually assume that past conditions and behaviors will continue into the future. They do not examine the influences behind the behavior of interest. Consequently, when sales levels, profits, or other variables of interest to a manager differ from those in the past, extrapolation and guessing do not explain why. Bivariate regression analysis is a statistical technique that analyzes the linear relationship between two variables by estimating coefficients for an equation for a straight line (PPT slide 12-21). One variable is designated as a dependent variable and the other is called an independent or predictor variable. A couple of points should be made about the assumptions behind regression analysis: As with correlation, regression analysis assumes a linear relationship is a good description of the relationship between two variables. Even though the common terminology of regression analysis commonly uses the labels dependent and independent for the variables, these labels do not mean we can say one variable causes the behavior of the other. Regression analysis uses knowledge about the level and type of association between two variables to make predictions. Statements about the ability of one variable to cause changes in another must be based on conceptual logic or preexisting knowledge rather than on statistical calculations alone. The use of a simple regression model assumes that the: variables of interest are measured on interval or ratio scales (except in the case of dummy variables). variables come from a normal population. error terms associated with making predictions are normally and independently distributed. A. Fundamentals of Regression Analysis (PPT slide 12-23 to 12-26) A fundamental basis of regression analysis is the assumption of a straight line relationship between the independent and dependent variables. This relationship is illustrated in Exhibit 12.9 (PPT slide 12-24). The general formula for a straight line is: Y = a + bX+ ei Where: Y = the dependent variable a = the intercept (point where the straight line intersects the Y-axis when X = 0 b = the slope (the change in Y for every 1 unit change in X) X = the independent variable used to predict Y ei = the error for the prediction In regression analysis, the researchers examine the relationship between the independent variable X and the dependent variable Y. To do so, they use the actual values of X and Y in their data set and the computed values of a and b. The calculations are based on the least squares procedure (PPT slide 12-25). The least squares procedure determines the best-fitting line by minimizing the vertical distances of all the data points from the line, as shown in Exhibit 12.10. The best-fitting line is the regression line (PPT slide 12-26). Any point that does not fall on the line is the result of the unexplained variance, or the variance in Y that is not explained by X (PPT slide 12-25). This unexplained variance is called error and is represented by the vertical distance between the estimated straight regression line and the actual data points. The distances of all the points not on the line are squared and added together to determine the sum of the squared errors, which is a measure of total error in the regression. In the case of bivariate regression analysis, researchers look at one independent variable and one dependent variable. However, managers frequently want to look at the combined influence of several independent variables on one dependent variable. Multiple regression is the appropriate technique to measure multivariate relationships. B. Developing and Estimating the Regression Coefficients (PPT slides 12-27) Regression uses an estimation procedure called ordinary least squares (OLS) that guarantees the line it estimates will be the best fitting line. Ordinary least squares is a statistical procedure that results in equation parameters (a and b) that produce predictions with the lowest sum of squared differences between actual and predicted values (PPT slide 12-27). The betas (b) are the regression coefficients. A regression coefficient is an indicator of the importance of an independent variable in predicting a dependent variable (PPT slide 12-27). Large coefficients are good predictors and small coefficients are weak predictors. The differences between actual and predicted values of Y are represented by ei (the error term of the regression equation). C. SPSS Application—Bivariate Regression (PPT slide 12-28) This section illustrates bivariate regression analysis using the Santa Fe Grill database. Exhibit 12.11 contains the results of the bivariate regression analysis (PPT slide 12-28). D. Significance (PPT slide 12-29) Once the statistical significance of the regression coefficients is determined, the first question about relationship is answered, “Is there a relationship between the dependent and independent variable?” A second question to ask is: “How strong is that relationship?” The output of regression analysis includes the coefficient of determination, or r2—which describes the amount of variation in the dependent variable associated with the variation in the independent variable. The regression r2 also tells the researcher what percentage of the total variation in the dependent variable can she/he explain by using the independent variable. The r2 measure varies between 0.00 and 1.00, and is calculated by dividing the amount of variation the researcher has been able to explain with her/his regression equation by the total variation in the dependent variable. When examining the substantive significance of a regression equation, the researcher should look at the size of the r2 for the regression equation and the size of the regression coefficient. The regression coefficient may be statistically significant, but still relatively small, meaning that the researcher’s dependent measure won’t change very much for a given unit change in the independent measure. E. Multiple Regression Analysis (PPT slide 12-30 and 12-31) In most problems faced by managers, there are several independent variables that need to be examined for their influence on a dependent variable. Multiple regression analysis is a statistical technique which analyzes the linear relationship between a dependent variable and multiple independent variables by estimating coefficients for the equation for a straight line (PPT slide 12-30). Multiple independent variables are entered into the regression equation, and for each variable a separate regression coefficient is calculated that describes its relationship with the dependent variable. The relationship between each independent variable and the dependent measure is still linear. The easiest way to analyze the relationships is to examine the regression coefficient for each independent variable, which represents the average amount of change expected in Y given a unit change in the value of the independent variable the researcher is examining. With the addition of more than one independent variable, the researchers have some new issues to consider. One is the possibility that each independent variable is measured using a different scale. To solve this problem, the researchers calculate the standardized regression coefficient. It is called a beta coefficient (PPT slide 12-31). It is an estimated regression coefficient that has been recalculated to have a mean of 0 and a standard deviation of 1. Such a change enables independent variables with different units of measurement to be directly compared on their association with the dependent variable. Standardization removes the effects of using different scales of measurement. For example, years of age and annual income are measured on different scales. Beta coefficients will range from 0.00 to 1.00, and can be either positive or negative. A positive beta means as the size of an independent variable increases then the size of the dependent variable increases. A negative beta means as the size of the independent variable increases then the size of the dependent variable gets smaller. F. Statistical Significance (PPT slide 12-32 and 12-33) After the regression coefficients have been estimated, the researcher must examine the statistical significance of each coefficient. Each regression coefficient is divided by its standard error to produce a t statistic, which is compared against the critical value to determine whether the null hypothesis can be rejected. Many times not all the independent variables in a regression equation will be statistically significant. If a regression coefficient is not statistically significant, that means the independent variable does not have a relationship with the dependent variable and the slope describing that relationship is relatively flat: the value of the dependent variable does not change at all as the value of the statistically insignificant independent variable changes. When using multiple regression analysis, it is important to examine the overall statistical significance of the regression model. The amount of variation in the dependent variable the researchers have been able to explain with the independent measures is compared with the total variation in the dependent measure. This comparison results in a statistic called a model F statistic which is compared against a critical value to determine whether or not to reject the null hypothesis (PPT slide 12-33). If the F statistic is statistically significant, it means that the chances of the regression model for the researcher’s sample producing a large r2 when the population r2 is actually 0 are acceptably small. G. Substantive Significance (PPT slide 12-34) Once the researchers have estimated the regression equation, they need to assess the strength of the association. The multiple r2 or multiple coefficient of determination describes the strength of the relationship between all the independent variables in our equation and the dependent variable. The larger the r2 measure, the more of the behavior of the dependent measure is associated with the independent measures the researchers are using to predict it. To summarize, the elements of a multiple regression model to examine in determining its significance include the: r2 Model F statistic Individual regression coefficients for each independent variable Associated t statistics Individual beta coefficients The appropriate procedure to follow in evaluating the results of a regression analysis is: Assess the statistical significance of the overall regression model using the F statistic and its associated probability Evaluate the obtained r2 to see how large it is Examine the individual regression coefficients and their t statistics to see which are statistically significant Look at the beta coefficients to assess relative influence H. Multiple Regression Assumptions (PPT slide 12-35 and 12-38) The ordinary least squares approach to estimating a regression model requires that several assumptions be met. Among the more important assumptions are: Linear relationship Homoskedasticity—the pattern of the covariation is constant (the same) around the regression line, whether the values are small, medium, or large (PPT slide 12-35). Normal distribution The linearity assumption was illustrated in Exhibits 12.9 and 12.10. Exhibit 12.12 illustrates heteroskedasticity—the pattern of covariation around the regression line is not constant around the regression line, and varies in some way when the values change from small to medium and large (PPT slide 12-37). A normal curve is a curve that indicates the shape of the distribution of a variable is equal both above and below the mean (PPT slide 12-36). A normal curve is shown in Exhibit 12.13 (PPT slide 12-38). I. SPSS Application—Multiple Regression (PPT slide 12-39 and 12-40) Regression can be used to examine the relationship between a single metric dependent variable and one or more metric independent variables. This section illustrates bivariate regression analysis using the Santa Fe Grill database. It takes students through the steps necessary to complete the required multiple regression analysis. The SPSS output for the multiple regression is shown in Exhibit 12.14 (PPT slide 12-39). Multicollinearity is a situation in which several independent variables are highly correlated with each other (PPT slide 12-40). This characteristic can result in difficulty in estimating separate or independent regression coefficients for the correlated variables. Exhibit 12.15 illustrates a correlation matrix of regression model variables. MARKETING RESEARCH IN ACTION THE ROLE OF EMPLOYEES IN DEVELOPING A CUSTOMER SATISFACTION PROGRAM (PPT slide 12-41) The Marketing Research in Action in this chapter explains an internal survey of plant workers and managers conducted by the plant manager of QualKote Manufacturing. In order to answer his questions about customer satisfaction, the plant manager conducted the survey using a 7-point scale. A multiple regression was run using SPSS with responses of the 57 employees as input to the model. The output is shown in Exhibits 12.16 and 12.17. The study focused on employees’ opinions of quality management and customer satisfaction. The r2 for the relationship studied by the survey was 67.0. Instructor Manual for Essentials of Marketing Research Joseph F. Hair, Mary Celsi, Robert P. Bush, David J. Ortinau 9780078028816, 9780078112119