Chapter 13 Estimation and Hypothesis Testing V: Chi-square and Correlation for Nominal and Ordinal Data Learning Objectives: 1. Analyze data consisting of two nominal variables. 2. Measure the strength of an association between nominal variables using the Phi coefficient of Cramer’s V Coefficient. 3. Analyze data consisting of one dichotomous variable and one interval or ratio variable using the point-biserial correlation coefficient. 4. Understand the similarity between the point-biserial correlation coefficient and Pearson’s correlation coefficient. 5. Analyze data consisting of two rank ordered ordinal variables using Spearman’s correlation coefficient. Chapter Summary In this final chapter, students learn about testing hypotheses in relation to nominal and ordinal data. Specifically, students learn about the statistical measures of association appropriate for situations where (a) both variables are nominal, (b) one variable is dichotomous nominal and the other is interval or ratio, and (c) both variables are rank ordered ordinal. Key Formulas The following represent the key formulas for this chapter. PowerPoint slides are provided for each chapter. In addition to these slides, a PDF file containing only the formulas are also provided. Pearson’s Chi-Square Test of Independence Expected Frequencies Degrees of Freedom for the Chi-Square Test of Independence. Phi Coefficient Cramer’s V Coefficient Pearson’s Chi-Square Goodness of Fit Degrees of freedom for Pearson’s Chi- Square Goodness of Fit Point-Biserial Correlation Coefficient t-value for the Point-Biserial Correlation Coefficient Spearman’s Correlation Coefficient Interactive Figures: The textbook contains interactive figures. You may wish to use these in a lecture. Students also have access to these. For this chapter, there is one interactive figure. 1. Table 13.3 is an interactive example based on Table 13.3 in the textbook. These interactive figures can be found on in the eBook and the Library under Chapter 13 Resources. Typical Lecture Material We have provided two sample lectures below. You may wish to add in additional discipline specific information to make these more relevant to your students. Lecture 1: Objective: To be able to draft a contingency table, make the distinction between observed and expected frequencies and to understand the appropriateness of different tests of correlation. Review the following concept table with your students; and help them to fill in the definitions of each statistical concept. The definitions that the students should come up with are in italics. Statistical Concept Defintion Pearsons Chi Square Test of Independence Is used for two nominal variables where you want to see if they are related. Contingency Table Compare two (or more) categorical variables by cross tabulating the categories of each variable. Observed Frequency Is the values we found in our sample data Expected Frequency Is the values we would expect to occur by chance or other expected values as set by the researcher. Phi Coefficient Is appropriate for nominal variables in a two by two contingency table. Cramers V Coefficient Is used for nominal variables in a contingency table that is larger than two by two. Pearsons Chi Square Goodness of Fit Tests the null hypothesis that there is no significant difference between the observed and expected frequencies. The Point-Biserial Correlation Measures the association between a dichotomous nominal variable and an interval or ratio variable. Spearman’s Correlation Coefficient Used to measure the association between two ranked ordered variables. Example 1: Ask your students to answer the following questions: 1. What distinguishes a point-biserial correlation from an independent sample t-test? a. The point-biserial correlation is used when you want to understand the association between two variables. One variable is dichotomous and the other is interval or ratio. The independent t-test is used when you want to understand the difference between two interval or ratio variables. 2. True or False, the Phi coefficient is never negative. (True. Phi coefficient ranges from 0 ( no relationship) to +1 (perfect relationship). 3. True or False. The Point-Biserial Correlation Coefficient is the same as Pearson’s r. (True. Algebraically they are the same.). Example 2: Read the following scenario to your students: You are interested in studying the relationship between crime type (Auto Theft versus Physical Assault) and alcohol involvement (No Alcohol Consumed versus Alcohol Consumed by the offender). You hypothesize that crime type is dependent on alcohol involvement, meaning that there is a relationship between the two variables. You randomly select 100 court files whereby 50 were auto theft cases and 50 were physical assault cases. Among the auto theft cases, 33 cases did not involve alcohol. Among the physical assault cases, 29 cases involved alcohol. 1. Compile this information into a 2X2 contingency table. Note: Draw the following contingency table on the board. The numbers in bold are the values students are given. The others are values students should calculate. Crime Type Alcohol Involvement Auto Theft Physical Assault Total Row Percent No Alcohol 33 21 54 54% Alcohol 17 29 46 46% Total 50 50 100 100% Column Percent 50% 50% 100% 2. Calculate the Expected frequencies for each cell. For Auto Theft and No Alcohol: 50 × 54 ÷ 100 = 27 For Auto Theft and Alcohol: 50 × 46 ÷ 100 = 23 For Physical Assault and No Alcohol: 50 × 54 ÷ 100 = 27 For Physical Assault and Alcohol: 50 × 46 ÷ 100 = 23 Crime Type Alcohol Involvement Auto Theft Physical Assault Total Row Percent No Alcohol 33 (27) 21 (27) 54 54% Alcohol 17 (23) 29 (23) 46 46% Total 50 50 100 100% Column Percent 50% 50% 100% Note: Expected values are in parentheses. 3. Calculate the Pearson Chi Square Value and degrees of freedom. 4. Calculate the critical value (95% confidence level) and determine whether to reject or fail to reject the null hypothesis The critical value for an alpha value of 0.05 and 1 degree of freedom is 3.841 Since the chi-square value of 5.80 is greater than the critical value of 3.841, we reject the null hypothesis of no relationship and conclude that based on this sample crime type and alcohol involvement are dependent on each other (meaning they are related). Lecture 2: Objective: To be able to estimate the Phi Coefficient, and the Point-Biserial Correlation Coefficient. Example 1: Ask your students to do the following: 1. Using the data from the previous lecture, and your corresponding calculations, calculate the strength of the association between crime type and alcohol involvement. ANSWER: Since this involves a 2 x 2 contingency table, the appropriate measure of the strength of the association is the Phi Coefficient. 𝜙 = √𝜒 2 𝑁 = √ 5.80 100 = √0.058 = 0.24 Based on the Phi Coefficient of 0.24, we would conclude that although the relationship is significant, it appears to be a weak one. Example 2: You are interested in determining if there is a relationship between Music Playing (No Versus Yes) and Recall of a memorized list of 10 words. You randomly select 10 individuals and then randomly assign 5 to the No Music condition and 5 to the Yes Music condition. You provide each participant the same list of 10 words, and allow them 30 seconds to memorize the words with music playing in the background (yes music condition) or silence (no music condition) depending on the condition they were assigned. After the 30 seconds have passed you ask them to recite the list and count the number of words they recalled correctly. The data is as follows: Music Number Recalled Participant (0 = No, 1 = Yes) Correctly Y2 1 0 10 100 2 0 6 36 3 0 5 25 4 0 8 64 5 0 9 81 6 1 8 64 7 1 4 16 8 1 6 36 9 1 6 36 10 1 5 25 1. Calculate the point-biserial correlation coefficient. The following represent the answers to the individual parts of the equation. 2. Calculate the t-value and degrees of freedom for the point-biserial correlation. 𝑡 = 𝑟𝑝𝑏√𝑁 − 2 √1 − 𝑟𝑝𝑏 2 = −0.49√10 − 2 √1 − (−. 492) = −0.49 × 2.83 √1 − 0.24 = −1.39 √0.76 = −1.39 0.87 = −1.59 𝑑𝑓 = 𝑛 − 2 = 10 − 2 = 8 3. Determine the critical value for an alpha value of 0.05. With 8 degrees of freedom and an alpha value of 0.05, the critical value would be ± 2.306 4. Do you reject of fail to reject the null hypothesis that there is no relationship between music condition and recall of the memorized words? Given that the test statistic of – 1.59 is not lower than the critical value of – 2.306, we fail to reject the null hypothesis. Based on this sample, there does not appear to be a relationship between music condition and recall of memorized words. Solutions to End-of-Chapter Problems Problem 13-1 a) Computed statistic = 24.835; critical value of 2 with 6 degrees of freedom = 12.592; conclusion: the null hypotheses would be rejected and the result is proven dependent. (LO1) b) Phi = 0.293 a weak relationship ; Cramers V = 0.207 a poor fit (LO2) Problem 13-2 a) rs = 0.643 (LO4) b) Computed statistic = 2.057; critical value of t with 6 degrees of freedom = ±2.447; conclusion: the null hypotheses would be accepted and the result is not statistically different from 0. (LO4) Problem 13-3 a) Computed statistic = 13.619; critical value of 2 with 6 degrees of freedom = 12.592; conclusion: the null hypotheses would be rejected and the result is proven dependent. (LO1) b) Phi = 0.051 a weak relationship ; Cramers V = 0.036 a poor fit (LO2) c) If alpha = 0.01, then the critical value of 2 with 6 degrees of freedom = 16.812; conclusion: the null hypotheses would be accepted and the result is proven independent. (LO1) Solutions to Interactive Exercises Exercise 13-1 Match the following I II 1. Chi-squared test of independence A. measure of the association between an interval/ratio and a categorized variable 2. Chi-squared test of the goodness of fit B. to test whether there is any significant difference between observed and expected distribution 3. Point-biserial correlation C. test of association of 2 nominal variables 4. Spearman’s rho D. measure of the association between ordinal variables and when rank order does matter Answer: I II 1. C 2. B 3. A 4. D Exercise 13-2 Use the table given below to test whether there is an association between gender and political affiliation. Conser -vative Libera l NDP Others Total Male 125 100 85 35 345 Female 200 75 60 20 355 Total 325 175 145 55 700 Level of significance = 0.05 Test statistic = 29.73 DF = 3 Critical value= 7.815 Reject Ho. There is evidence that the two variables are dependent. Exercise 13-2 What is the strength of the association of the relationship between the 2 variables? Interpret the value. Conser -vative Libera l NDP Others Total Male 125 100 85 35 345 Female 200 75 60 20 355 Total 325 175 145 55 700 Level of significance = 0.05 Answer: Cramer’s V =0.2061 The relationship is a weak relationship as V is less than 0.30 Solutions to SPSS Exercises Exercise 13-1 The following data was collected from 30 students at John Abbott for the fall 2003 semester. The information provided is for the incidence of being on probation (Y=yes; N=no) and average grade. Y = was on probation in the fall 2003 semester; N = was not on probation in the fall 2003 semester; Grade = average grade received for the fall 2003 semester probation grade probation grade probation grade Y D Y C N A N B N C N A N C Y F N B N A N B N B N C Y F Y C Y D Y C Y C N A N B Y C Y D N A N A N C Y F N A N B N C N C a) Prepare a table of cross tabulations with grade in the row and add percentages in terms of the explanatory variable. grade * probation Cross tabulation probation N Y Total grade A Count 7 0 7 % within probation 36.8% .0% 23.3% B Count 6 0 6 % within probation 31.6% .0% 20.0% C Count 6 5 11 % within probation 31.6% 45.5% 36.7% D Count 0 3 3 % within probation .0% 27.3% 10.0% F Count 0 3 3 % within probation .0% 27.3% 10.0% Total Count 19 11 30 % within probation 100.0% 100.0% 100.0% LO: 1 Page: 364-369 b) Test the null hypothesis that probation is independent of the grade received. Assume =0.05. Are your results statistically significant? grade * probation Cross tabulation probation N Y Total grade A Count 7 0 7 Expected Count 4.4 2.6 7.0 % within probation 36.8% .0% 23.3% B Count 6 0 6 Expected Count 3.8 2.2 6.0 % within probation 31.6% .0% 20.0% C Count 6 5 11 Expected Count 7.0 4.0 11.0 % within probation 31.6% 45.5% 36.7% D Count 0 3 3 Expected Count 1.9 1.1 3.0 % within probation .0% 27.3% 10.0% F Count 0 3 3 Expected Count 1.9 1.1 3.0 % within probation .0% 27.3% 10.0% Total Count 19 11 30 Expected Count 19.0 11.0 30.0 % within probation 100.0% 100.0% 100.0% Chi-Square Tests Value df Asymp. Sig. (2- sided) Pearson Chi-Square 18.256a 4 .001 Likelihood Ratio 24.271 4 .000 N of Valid Cases 30 Since the p-value is less than 0.05, then we conclude that probation and grade are dependent on one another. LO: 2 Page: 369-371 Equations from Chapter 13 Scott R. Colwell and Edward M. Carter c 2012 Equation 13.1: Pearson’s Chi-Square Test of Independence χ2 = X (O − E) 2 E Where: χ2 = chi-square statistic O = observed frequencies from the contingency table E = expected frequencies based on the null hypothesis Equation 13.2: Expected Frequencies Eij = Rowi × Columnj N Where: Eij = expected value for the cell in row i, column j Rowi = total value of row i Columnj = total value of column j N = grand total Equation 13.5: Degrees of Freedom for Pearson’s Chi-Square Test of Independence df = (R − 1)(C − 1) Where: df = degrees of freedom R = number of rows in the contingency table C = number of columns in the contingency table N = grand total Equation 13.7: The Phi Coefficient φ = rχ 2 N Where: χ2 = chi-square statistic N = sample size Equation 13.8: Cram´er’s V Coefficient V = s χ 2 N(k − 1) Where: χ2 = chi-square statistic N = sample size k = lesser value of the number of rows and columns Equation 13.11: Pearson’s Chi-Square Goodness of Fit χ2 = X (O − E) 2 E Where: χ2 = chi-square statistic O = observed frequencies from the contingency table E = expected frequencies based on the null hypothesis Degrees of Freedom for Pearson’s Chi-Square Goodness of Fit df = k − 1 Where: df = degrees of freedom k = number of categories in the variable Equation 13.12: Point-Biserial Correlation Coefficient rpb = ( Y¯1 − Y¯0)pp(1 − p) rPY 2− (PY)2 N N Where: rpb point-biserial correlation coefficient Y¯0 mean value of Y when X (nominal) = 0 Y¯1 mean value of Y when X (nominal) = 1 p = portion of values where X (nominal) = 1 ΣY 2 = sum of all squared values of Y (ΣY )2 = square of the sum of all values of Y N = total sample size Point-Biserial Correlation Coefficient = Pearson’s Correlation Coefficient rpb = r Where: rpb point-biserial correlation coefficient r = Pearson’s correlation coefficient Equation 13.14: t-value for the Point-Biserial Correlation Coefficient t = rpb √ N − 2 q1 − r2 pb Where: rpb = point-biserial correlation coefficient N = sample size Equation 13.16: Spearman’s Correlation Coefficient rs = 1 − 6ΣD 2 N(N2 − 1) Where: D = difference between the ranks of two variables for each respondent N = sample size Solution Manual for Introduction to Statistics for Social Sciences Scott R. Colwell, Edward M. Carter 9780071319126
Close