This Document Contains Chapters 3 to 4 Chapter 3 A Statistics Refresher SCALES OF MEASUREMENT Nominal Scales Ordinal Scales Interval Scales Ratio Scales Measurement Scales in Psychology DESCRIBING DATA Frequency Distributions Measures of Central Tendency The arithmetic mean The median The mode Measures of Variability The range The interquartile and semi-interquartile ranges The average deviation The standard deviation Skewness Kurtosis THE NORMAL CURVE The Area under the Normal Curve STANDARD SCORES z Scores T Scores Other Standard Scores Normalized standard scores CORRELATION AND INFERENCE The Concept of Correlation The Pearson r The Spearman Rho Graphic Representations of Correlation Meta-Analysis Close-up: The Normal Curve and Psychological Tests Everyday Psychometrics: Consumer (of Graphed Data), Beware! Meet an Assessment Professional: Meet Dr. Benoit Verdon Self-Assessment TERM TO LEARN evidence-based practice Methods, protocols, techniques, and procedures used by professionals that have a basis in clinical and research findings. Some relevant reference citations: Briner, R. B., & Rousseau, D. M. (2011). Evidence based I-O psychology: Not there yet. Industrial and Organizational Psychology: Perspectives on Science and Practice, 4(1), 3-22. Fulford, K. W. M. (2011). The value of evidence and evidence of values: Bringing together values-based and evidence-based practice in policy and service development in mental health. Journal of Evaluation in Clinical Practice, 17(5), 976-987. Simon, Robert I. (2011). Improving suicide risk assessment with evidence-based psychiatry. In M.Pompili & R. Tatarelli (Eds), Evidence-based practice in suicidology: A source book (pp. 45-54). Cambridge, MA: Hogrefe Publishing. For class consideration: The concept of evidence-based practice is raised in this chapter in the context of the discussion of meta-analysis. How does meta-analysis facilitate evidence-based practice? What are the limitations of meta-analysis, and other statistical tools in applied contexts? CLASS DISCUSSION QUESTIONS Here is a list of questions that may be used to stimulate class discussion, as well as critical and generative thinking, with regard to some of the material presented in this chapter of the text. 1. What is correlation? How does it differ from causation? What is the difference between the various correlation coefficients? Drawing upon the treatment of correlation in the text, ask students to choose the most appropriate type correlation coefficient (Pearson, Spearman, Rho, etc.) for each of the following. As a follow-up ask them for their belief about what the scatterplots might look like. Begin by writing on the board, displaying a slide, or passing out a sheet of paper with the following written: For each of the following, how would you determine the correlation between the variables, and estimate in a drawing the type of scatterplot that you think would result: a) chronological age and intelligence b) height and weight c) age and hat size d) gender and GPA e) gender and height 2. After briefly reviewing the benefits and limitations of meta-analysis, ask students what question or topic they would ideally like to see meta-analyzed. If you happen to be in a “smart classroom” (that is, one equipped with Internet access), see if any meta-analytic studies on these ideal topics have actually been done. 3. A provocative question to open discussion related to this chapter is presented in the textbook. It came in the form of a challenge issued to the student in the section of the chapter labeled "Describing Data." This challenge is reprinted here: Suppose you have magically changed places with the professor teaching this course and you have just administered an examination that consists of 100 multiple-choice items, where one point is awarded for each correct answer. The scores for the 25 students enrolled in your class could theoretically range from 0 (none correct) to 100 (all correct). Assume it is the day after your examination and you are sitting in your office with the data listed in Table 3-1. One task at hand is to communicate the test results to your class in a way that will best assist each individual student in understanding how he or she performed on the test in comparison to all of the other test-takers in the class. How do you accomplish this objective? At the end of the chapter, students are again encouraged to give some thought to this task and come to class prepared with some answers: It may be helpful at this time to review this “statistics refresher” to make certain that you indeed feel "refreshed". Apply what you have learned about frequency distributions, graphing frequency distributions, measures of central tendency, measures of variability, the normal curve, and standard scores to the question posed in the chapter. How would you communicate the data from Table 3-1 to the class? Which type of frequency distribution might you use? Which type of graph? Which measure of central tendency? Which measure of variability? Might reference to a normal curve or to standard scores be helpful? Why or why not? Come to the next class session prepared with your thoughts on the answers to these questions--as well as your own questions regarding any of the material that could still stand a bit more explanation. We will be building on your knowledge of basic statistical principles in the chapters to come, and it is important that such building be on a rock solid foundation. Your students are asked to get involved in the subject matter by doing more than merely reading. As a way to reinforce the concepts presented, have them calculate the various statistics presented in this chapter using the examination data presented in table. Computations and various methods for class presentation are listed here. a. Calculate the mean; the arithmetic average of all of the scores. Score (number correct) 78 67 69 63 85 72 92 67 94 62 61 44 66 87 76 83 42 82 84 51 69 61 96 73 79 SUM=1803 = 72.12 25 Mean=72.12 b. Calculate the arithmetic mean using frequency distribution data. Frequency Distribution of Scores from your test ____________________ Score f (frequency) X ____________________ f(x) 96 1 96 94 1 94 92 1 92 87 1 87 85 1 85 84 1 84 83 1 83 82 1 82 79 1 79 78 1 78 76 1 76 73 1 73 72 1 72 69 2 138 67 2 134 66 1 66 63 1 63 62 1 62 61 2 122 51 1 51 44 1 44 42 1 42 Ex= 1803 = 72.12 25 c. Identify the median, the “middlemost” score of the distribution. Frequency Distribution of Scores from your test ____________________ Score f (frequency) 96 1 94 1 92 1 87 1 85 1 84 1 83 1 82 1 79 1 78 1 76 1 73 1 72 1 (the middle score) 69 2 67 2 66 1 63 1 62 1 61 2 51 1 44 1 42 1 Median=72 Arrange scores in a frequency distribution. Count up 12 and count down 12 using the frequencies to find that the middle score, or median, is 72. d. Identify the mode, or the most frequently occurring score in the distribution. Frequency Distribution of Scores from your test ____________________ Score f (frequency) 96 1 94 1 92 1 87 1 85 1 84 1 83 1 82 1 79 1 78 1 76 1 73 1 72 1 69 2 67 2 66 1 63 1 62 1 61 2 51 1 44 1 42 1 As it turns out, this distribution has three “most frequently occurring” scores. This Distribution therefore has 3 modes: 69, 57, and 61. This distribution may be characterized as “trimodal.” e. What is the range of this distribution? The range of a distribution is equal to the difference between the highest and the lowest scores. Frequency Distribution of Scores from your test ____________________ Score f (frequency) 96 1 94 1 92 1 87 1 85 1 84 1 83 1 Q3 82 1 79 1 78 1 76 1 73 1 72 1 Q2 69 2 67 2 66 1 63 1 Q1 62 1 61 2 51 1 44 1 42 1 Here, the range is equal to the difference between the highest score (96) and the lowest score (42). So, 96-42= 54. The range of this distribution is 54. f. On average, how much does each score in this distribution vary from the mean score? The average deviation or AD is calculated by (a) determining deviation scores for each score in the distribution. In other words, how much does each individual score vary from the mean? (b) summing all of the deviation scores (c) dividing the obtained sum by the number of scores to obtain the average. Frequency Distribution of Scores from Your test ____________________ _ Score f (frequency) x-x 96 1 23.88 94 1 21.88 92 1 19.88 87 1 14.88 85 1 12.88 84 1 11.88 83 1 10.88 82 1 9.88 79 1 6.88 78 1 5.88 76 1 3.88 73 1 .88 72 1 .12 69 1 3.12 69 1 3.12 67 1 5.12 67 1 5.12 66 1 6.12 63 1 9.12 62 1 10.12 61 1 11.12 61 1 11.12 51 1 21.12 44 1 28.12 42 1 30.12 The sum of the deviation scores= 287.12. We divide this total by 25 (the total number of scores) and find the average deviation to be 11.48. Average Deviation (AD)=11.48 g. What is the standard deviation of this distribution? The standard deviation is equal to the square root of the average squared deviations about the mean. Stated another way, the standard deviation is equal to the square root of the variance. So one way to begin calculating the standard deviation, is to calculate the variance. That is done as follows: X f x-x (x-x)2 96 1 23.88 570.25 94 1 21.88 478.73 92 1 19.88 395.21 87 1 14.88 221.41 85 1 12.88 165.89 84 1 11.88 141.13 83 1 10.88 118.37 82 1 9.88 97.61 79 1 6.88 47.33 78 1 5.88 34.57 76 1 3.88 15.05 73 1 .88 .77 72 1 - .12 .01 69 1 -3.12 9.73 69 1 -3.12 9.73 67 1 -5.12 26.21 67 1 -5.12 26.21 66 1 -6.12 37.45 63 1 -9.12 83.17 62 1 -10.12 102.41 61 1 -11.12 123.65 61 1 -11.12 123.65 51 1 -21.12 446.05 44 1 -28.12 790.73 42 1 -30.12 907.21 4972.53 =198.90 25 (total scores) Variance =198.90 Standard Deviation (square root of variance) = 14.10 The article, “Methods of Expressing Test Scores” (Test Service Notebook #148 published by the Psychological Corporation) is presented in the online Exercises in Psychological Testing and Assessment. The chart presented on page 2 of the article can be particularly helpful to students in perceiving the relationships between the various test score statistics presented not only in this chapter but the following chapter (4). h. Returning now to the questions posed for class discussion, some acceptable answers are as follows: Q: "How would you communicate the data from table 3-1 to the class? A: The Instructor could initially present a frequency distribution, graph the results, and provide meaningful statistics to help the students interpret their individual score. Q: "What type of frequency distribution might you use?" A: A grouped frequency distribution (see table 3-3) would facilitate interpretation by your students. This form of presentation will give your students a clearer picture of where they stand in comparison to their classmates. It will also provide you, as the instructor, with a better idea of trends in the data. However, if you (the instructor of the hypothetical class), intends to calculate other statistics (e.g., z scores, means, standard deviation, percentile ranks) a frequency distribution is preferable as it facilitates computation of these statistics. The method of computation/presentation of these statistics, as presented above, utilizes a frequency distribution. Q: "Which type of graph?" A: A histogram would provide the most meaningful graphic representation of the data. It represents the simplest graphic representation to understand. Q: "Which measure of central tendency?" A: Due to the lack of skewness, the mean would represent the most reasonable measure of central tendency. Q "Which measure of variability?" A: The standard deviation represents the most commonly used index of variability if the mean is the measure of central tendency chosen. If the median is used, the interquartile range is the most common. Q: "Might reference to a normal curve or standard scores be helpful? Why or why not?" A: Either reference to a normal curve or a standard score may be helpful depending on the purpose of the presentation and the characteristics of the sample of scores. Standard scores (e.g., z scores, T scores, stanines) are easy to interpret and provide an index of the test-taker's performance relative to others. Computation of several standard scores follows: z scores (Note: The standard deviation of this set of scores is 14.10): X f x-x (x-x)= z 14.10 96 1 23.88 1.69 94 1 21.88 1.55 92 1 19.88 1.40 87 1 14.88 1.05 85 1 12.88 .91 84 1 11.88 .84 83 1 10.88 .77 82 1 9.88 .70 79 1 6.88 .49 78 1 5.88 .42 76 1 3.88 .28 73 1 .88 .06 72 1 - .12 -.008 69 1 -3.12 -.22 69 1 -3.12 -.22 67 1 -5.12 -.36 67 1 -5.12 -.36 66 1 -6.12 -.43 63 1 -9.12 -.65 62 1 -10.12 -.72 61 1 -11.12 -.79 61 1 -11.12 -.79 51 1 -21.12 -1.50 44 1 -28.12 -1.99 42 1 -30.12 -2.14 T scores=10z+50 Score f (frequency) T 96 1 67 (66.90) 94 1 66 (65.50) 92 1 64 87 1 61 (60.5) 85 1 59 (59.1) 84 1 58 (58.4) 83 1 58 (57.7) 82 1 57 79 1 55 (54.9) 78 1 54 (54.2) 76 1 53 (52.8) 73 1 51 (50.6) 72 1 50 (49.9) 69 2 48 (47.8) 67 2 46 (46.4) 66 1 46 (45.7) 63 1 44 (43.5) 62 1 43 (42.8) 61 2 42 (42.1) 51 1 35 44 1 30 (30.1) 42 1 29 (28.6) Note on assigning grades to student “instructor” (for this exercise): It may be helpful to refer to the normal curve in your grading if you want to "grade on the curve". For example, you may decide that you want to assign the grade of A to the top 15% of scores on the test (above the 85th percentile and more than one standard deviation above the mean), a B to scores that fall between the 60th and 84th percentile, a C to scores between the 20th and 59th percentiles, a D to scores between the 6th and 19th percentile, and an F to those scores that fall at or below the 5th percentile. If it is assumed that the scores on the test are normally distributed, there is a precise mathematical relationship between z, T and other standard scores. If normally distributed, linear z and T scores can be interpreted as they relate to percentiles. You can convert raw scores (number correct) to z scores and then use a chart to convert to percentiles to determine at what point --such as a percentile score-- the examinee scored in relationship to the normal curve. You can then use that percentile to assign grades. Although normative samples for most standardized tests are very large and their distributions of scores approach normality, most classroom tests for groups of 25 students are not normally distributed and it would not be appropriate to complete the transformations above for typical classroom tests. Of course, what is done for the purposes of learning about statistics is entirely another matter. Answer: To effectively communicate the data from Table 3-1 to the class, a grouped frequency distribution with a histogram graph would aid in understanding. Mean and standard deviation could offer insights into performance variation, while reference to a normal curve or standard scores might facilitate comparisons among students, depending on the distribution's characteristics and presentation objectives. 4. Ask class members about the types of distributions of data they would expect for the following: a) IQ test scores from all pupils enrolled in Illinois schools b) IQ test scores from all pupils enrolled in Bayonne, New Jersey, schools c) test scores from a very difficult test d) a series of 1,000 rolls of the die e) college professors' salaries in an university department in which all of the professors have tenure and have been teaching for more than 20 years f) test scores from a very easy test g) test scores from a very difficult test Answer: For IQ test scores from all pupils enrolled in Illinois schools, I would expect a roughly normal distribution with a mean around 100, reflecting the typical distribution of intelligence in the population. In Bayonne, New Jersey, schools, the distribution may still be roughly normal but could potentially show slight variations due to local demographics or educational practices. Test scores from a very difficult test might show a skewed distribution with a large number of low scores and a smaller number of high scores. A series of 1,000 rolls of the die would likely follow a uniform distribution, with each outcome having an equal probability. College professors' salaries in a department with tenured professors teaching for over 20 years might exhibit a positively skewed distribution, with most salaries clustered around the higher end due to experience and tenure benefits. Conversely, test scores from a very easy test may show a skewed distribution with a large number of high scores and fewer low scores. Finally, test scores from a very difficult test might exhibit a distribution skewed towards lower scores, with fewer high scores due to the challenging nature of the test. 5. Ask each student to pick a number from one to ten, and then to enter a zero if it was an even number, and the number 1 if it was an odd number. This data set can then be used throughout the discussion of statistics to illustrate concepts and provide students with data to use for practice exercises. See Jacobs (1980) for more detail on exercises like this one. Answer: Instructing students to pick a number from one to ten and then entering a zero for even numbers and one for odd numbers provides a simple dataset for illustrating statistical concepts. This exercise introduces students to the basics of data collection and categorization, facilitating discussions on frequency distributions, central tendency, and variability. By engaging students in hands-on data collection, they gain practical experience in organizing and analyzing data, laying a foundation for more complex statistical techniques. Moreover, using a familiar task like picking numbers makes the concept of statistics more relatable and accessible to students. Encouraging students to analyze the dataset can promote critical thinking and problem-solving skills as they interpret patterns and draw conclusions from the data. Overall, this exercise serves as a valuable starting point for exploring statistical concepts in a meaningful and engaging way. IN-CLASS DEMONSTRATIONS 1. Bring something to class Bring to class (a) graphic data published in newspapers and magazines. Supplement the Everyday Psychometrics feature on graphing data, by bringing in (or assigning students to bring to class) graphs published in newspapers or magazines. Lead a class discussion regarding each of the graphs: do they aid or hinder interpretation of the data they illustrate? (b) the Attitudes Towards Statistics Scale for administration to the class (Wise, 1985). Using the test score data obtained from an administration of this test, calculate the mean, median, mode, range, interquartile range, and semi-interquartile range, variance, and standard deviation. Which measure of central tendency best represents the distribution of exam scores? Discuss the results with reference to the statistics and statistical concepts presented in Chapter 3. (c) the Student Information Questionnaire (Thompson, 1994) for class administration After the test administration, use the data obtained to illustrate statistical concepts presented in Chapter 3. (d) M&Ms (for strictly academic reasons, of course). This will amount to a “sweet way to teach students about the sampling distribution of the mean” (see Dyck & Gee, 1998) for more details. (e) Gambling Paraphernalia for “Las Vegas Night” (Well, sort of) Bring dice and decks of playing cards to class for a “Las Vegas Night” in the measurement class room (again, for strictly academic reasons, of course). Divide students into teams. Have students generate data using the dice and the cards as tools. For example, one team of students could be assigned the task of throwing the dice 10 times and recording how many times 7 or 11 is thrown. Another team of students could be assigned the task of shuffling the deck, and then dealing each member of the team two cards. This is repeated for 10 deals. The two cards dealt will be recorded and it will be noted whether they are a pair. All student teams then summarize the data they have generated using some of the tools discussed in Chapter 3 (such as a frequency distribution, grouped frequency distribution, histogram, and frequency polygon of the results. They may even want to go a little further and delve into the topic of probability? What is the probability of throwing 7 or 11? What is the probability of being dealt a pair? 2. Bring someone to class. Invite a guest speaker to class. The guest speaker could be: (a) a faculty member Invite a faculty member (from your university or a neighbouring one) in the Department of Mathematics (or a related department. Ask the speaker to elaborate on the discussion of the statistics and statistical concepts (including correlation) discussed in Chapter 3. (b) a local user of psychological tests Invite a local user of psychological tests from any setting who can discuss how the statistics or statistical concepts discussed in Chapter 3 are used in “real life.” IN-CLASS ROLE-PLAY AND DEBATE EXERCISES 1. Role Play: Labor versus Management Dispute Mediation Identify a labor versus management dispute that either is currently in progress, recently settled, or brewing in the future. One-third of the class will role-play Labor in the dispute, one-third of the class will role-play Management in the dispute, and the remaining one-third of the class will role-play that of Mediator. All students will thoroughly research the issues—and, in particular, the relevant statistics—involved with regard to the dispute. A debate between labor and management is held in class, focusing on the statistics involved. Students may elect to design their own graphic representations of the statistics. Sufficient time is allotted for both the debate and the “judgment” of the Mediators. 2. Debate: Meta-Analysis in Applied Clinical Work Students prepare in advance for a debate on the pros and cons of using data from meta-analyses in applied clinical work by reviewing the research literature. One group, the “Statisticians,” argue the Pro position while another group, the “Clinicians,” argue the Con position. OUT-OF-CLASS LEARNING EXPERIENCES 1. Take a field trip. Arrange a trip as a class to: (a) a corporate Human Resources department Arrange a visit to the human resources department of a local business or large corporation that employs statistics and/or statistical concepts such as those discussed in Chapter 3, and ask a representative to elaborate on these topics with “real life” applications. (b) an accounting office of a governmental agency Arrange a visit to a local governmental agency, such as a local or state police facility, that employs statistics and/or statistical concepts such as those discussed in Chapter 3, and ask a representative to elaborate on these topics with “real life” applications. (c) a local consumer research firm Arrange a visit to a local consumer research firm that employs statistics and/or statistical concepts such as those discussed in Chapter 3, and ask a representative to elaborate on these topics with “real life” applications. SUGGESTED ASSIGNMENTS 1. Critical Thinking Exercise: “Nominal data is data.” Some psychologists do not consider nominal scales a form of measurement. Argue the case that they are wrong. 2. Generative Thinking Exercises (a) Generative thinking with regard to psychological traits. The Close-up in this chapter discussed a number of traits that are normally distributed. For class discussion, have students generate a list of traits that one would expect would not be normally distributed. (b) NOIR in everyday life. List examples of nominal, ordinal, interval, and ratio from everyday experience. Some possible realms of everyday experience to draw from: rankings from a gymnastic meet, height, weight, blood type, Major League Baseball standings, page numbers for this book, your test grades in this course, your college grade point average (GPA), IQ scores, and personality test scores. 3. Read-then-Discuss Exercises. (a) “Real-Life” Statistics Assign the class to read Shatz (1985), which is a two-page article on a labor article dispute at the Greyhound Bus Company. After everyone has read it, discuss the use of the statistics presented in the “real-life” context of a labor dispute, and then generalize to other “real-life” contexts. (b)Current Events Re-Visited with an Eye toward Statistics Have each student review the evening newspaper or weekly news magazines looking for news articles or advertisements that include references to the statistics discussed in this chapter. Students will bring in the articles and discuss them during the next class session, focusing on the type of statistics utilized and whether the most appropriate statistics were employed. 4. Research-then-Report Exercises (a) Write a report on Florence Nightingale (…. that’s right … Florence Nightingale). The report could be titled “Measurement, Statistics, and Meaningful Change: The Case of Florence Nightingale.” From this report, a reader should learn of Nurse Florence Nightingale’s careful battlefield measurement and compilation of statistics. Her diligence in this matter would prove to the British government that its wounded soldiers weren’t dying from their wounds; they were dying from being hospitalized in a British hospital. Urge the student or students assigned this report to provide details about Nightingale’s measurements, the statistics she presented to the government, and the meaningful, life-changing modifications in government policy, which were a direct result of her efforts. (b) Build a normal curve. Assign students the task of building their own, three-dimensional normal curve. For more details on this assignment, consult Addison and Hillman (1997). These authors describe the use of three-dimensional model of the normal curve that is approximately 20 cm high and 32 cm long. Their model was constructed with a total of six pieces, two each of three different sections. "The pieces correspond in relative size and shape to approximate areas delineated by standard deviation units in an empirical normal distribution (i.e., 34% from 0 to 1 standard deviation, 14% from 1 to 21 standard deviations, and 2% from 2 to 3 standard deviations). The pieces comprising each pair are identical in size, shape, and color. Thus, there are two blue pieces that each represent 34% of the area under the curve, two red pieces that each represent 14% of the area under the curve, and two green pieces that each represent 2% of the curve" (p. 1). Once built, students can manipulate the pieces in the model to assist in visualizing the concept of symmetry as it relates to the normal distribution. 5. Other Exercises and Assignments: (a) Correlation: “Buddy, Can You Spare A Scatterplot?” As an exercise in correlation, select a few volunteer students to play the role of a research team exploring this question: “Is number of coins of pocket change correlated with number of rings on hand?” These students will research this question by going to each of the other students in class (their research subjects) and compiling the following data for each subject: (1) number of coins (loose change) in pocket or purse, and (2) number of rings on hands. The “final report” of the research team should include, at a minimum, a scatterplot of the data, a calculation of a Pearson r, and a discussion of the significance of the r derived, as well as the significance—or lack of it—of the study, in general. (b) IQ, GPA, and the Ace of Spades: Another Correlation Exercise See Huck et al.’s (1992) exercise designed to help students understand how increases in standard deviation affect the value of a Pearson r. The exercise uses standard playing cards to generate hypothetical scores on two common variables, IQ and GPA. Students calculate and compare standard deviations and correlation coefficients. It’s something that they can do at home and report on. Alternatively, it could be an in-class demonstration. MEDIA RESOURCES On the Web A noncomprehensive sampling of some of the material available on the World Wide Web. 1. WISE online tutorials Claremont Graduate University (CGU) maintains a “Web Interface for Statistics Education” (WISE) with statistics-related tutorials that is available free online (Aberson et al., 2000). All one needs to access this wealth of useful material is a browser that is enabled with Java and Java Script. Two WISE tutorials that are particularly appropriate for use with Chapter 3 are (a)Sampling Distribution of the Mean and The t-test Tutorial. Access both WISE tutorials at CGU’s Web site: http://wise.cgu.edu 2. Other online tutorials For an online tutorial on measures of central tendency, click on: http://simon.cs.vt.edu/SoSci/converted/MMM/choosingct.html For an online tutorial on the normal curve, click on: http://davidmlane.com/hyperstat/normal_distribution.html 3. A wealth of “real life” statistics from the State of Illinois to analyze, graph, and otherwise manipulated can be found at: http://www.fedstats.gov/qf/states/17000.html 4. www.texasoft.com/winkpear.html A “teaser tutorial” on correlation; of limited value unless you buy their software. ON DVD, VHS, CD, and Other Media Displaying Data (2010, DVD, 27 minutes, INS) Useful in teaching students how to construct and interpret various graphs. Introduction to Statistics (2010, DVD, 27 minutes, INS) A general introduction to the value and use of statistics. Statistics: Decisions Through Data (2005; 5-video set on DVD or VHS, 1 hour per video; COMAP). Provides a statistics refresher with applied examples. Against All Odds: Inside Statistics Program 3, Describing Distribution and Program 4, Normal Distribution (1989, VHS, 58 minutes, COU) Program 3 includes coverage of mean, median, quartiles, boxplots, interquartile range, and standard deviation. Program 4 uses applied examples to demonstrate various distributions including the normal curve. Describing Distributions: Normal Distributions Program #4 (1988, VHS, 30 minutes, UMN) Presents ways of describing distributions with emphasis on the normal curve. Probability and Statistics: Variance and Standard Deviation (1992, VHS, 30 minutes, PSU) Covers the estimation and computation of variance and standard deviation of populations, samples, and grouped data. REFERENCES Aberson, C. L., Berger, D. E., Healy, M. R., Kyle, D. J., & Romero, V. L. (2000). Evaluation of an interactive tutorial for teaching the central limit theorem. Teaching of Psychology, 27, 289–291. Addison, W. E., & Hillman, K. R. (1997, August). A three-dimensional model of the normal curve. Paper presented at the annual convention of the American Psychological Association, Chicago, IL. American Educational Research Association. (1999). Standards for educational and psychological tests. Washington, DC: Author. Dyck, J. L., & Gee, N. R. (1998). A sweet way to teach students about the sampling distribution of the mean. Teaching of Psychology, 25, 192–195. Huck, S. W., Wright, S. P. & Park, S. (1992). Pearson's r and spread: A classroom demonstration. Teaching of Psychology, 19, 45–47. Jacobs, K. W. (1980). Instructional techniques in the introductory statistics course: The first class meeting. Teaching of Psychology, 7, 241–242. Shatz, M. A. (1985). The Greyhound strike: Using a labor dispute to teach descriptive statistics. Teaching of Psychology, 12, 85–86. Thompson, W. B. (1994). Making data analysis realistic: Incorporating research into statistics courses. Teaching of Psychology, 21, 41–43. Copies of Student Information Questionnaire is available from: W. Burt Thompson, Department of Psychology, P.O. Box 2208, Niagara University, Niagara, NY 14109-2208. Wise, S. L. (1985). The development and validation of a scale measuring attitudes toward statistics. Educational and Psychological Measurement, 45, 401–405. Chapter 4 Of Tests and Testing SOME ASSUMPTIONS ABOUT PSYCHOLOGICAL TESTING AND ASSESSMENT Assumption 1: Psychological Traits and States Exist Assumption 2: Psychological Traits and States Can Be Quantified and Measured Assumption 3: Test-Related Behavior Predicts Non-Test-Related Behavior Assumption 4: Tests and Other Measurement Techniques Have Strengths and Weaknesses Assumption 5: Various Sources of Error Are Part of the Assessment Process Assumption 6: Testing and Assessment Can Be Conducted in a Fair and Unbiased Manner Assumption 7: Testing and Assessment Benefit Society WHAT’S A “GOOD TEST”? Reliability Validity Other Considerations NORMS Sampling to Develop Norms Sampling Developing norms for a standardized test Types of Norms Percentiles Age norms Grade norms National norms National anchor norms Subgroup norms Local norms Fixed Reference Group Scoring Systems Norm-Referenced versus Criterion-Referenced Evaluation CULTURE AND INFERENCE Everyday Psychometrics: Putting Tests to the Test Close-up: How “Standard” is Standard in Measurement? Meet an Assessment Professional: Meet Dr. Howard Atlas and Dr. Steve Julius Self-Assessment TERM TO LEARN culturally informed psychological assessment an approach to evaluation that is keenly perceptive about and responsive to issues of acculturation, values, identity, worldview, language, and other culture-related variables as they may affect the evaluation process or the interpretation of resulting data. Some relevant reference citations: Cervantes, R. C., Fisher, D.G., Córdova Jr., D., & Napper, L. E. (2012). The Hispanic Stress Inventory-Adolescent Version: A culturally informed psychosocial assessment. Psychological Assessment, 24(1), 187-196 Jones, L. V., Hopson, L. M., & Gomes, A.-M. (2012). Intervening with African-Americans: Culturally specific practice considerations. Journal of Ethnic & Cultural Diversity in Social Work: Innovation in Theory, Research & Practice, 21(1), 37-54. Schoulte, J. C. (2011). Bereavement among African Americans and Latino/a Americans. Journal of Mental Health Counseling, 33(1), 11-20. For class consideration: Why is a consideration of culture so key to psychological assessment? And if it is so key, why has this factor been historically been de-emphasized in the past? What are the limits of being a “culturally informed” psychological assessor? What else might you add (or take-away) from the list of culturally informed “do’s” and “don’ts” that is presented at the end of Chapter 4? CLASS DISCUSSION QUESTIONS Here is a list of questions that may be used to stimulate class discussion, as well as critical and generative thinking, with regard to some of the material presented in this chapter of the text. 1. Stimulate discussion on each of the seven assumptions regarding testing and assessment at the beginning of the chapter. What examples can students come up with that illustrate each of these assumptions? Can they come up with any illustrations that run counter to these assumptions? Answer: Students can stimulate discussion on each of the seven assumptions regarding testing and assessment by providing examples that illustrate each one. For instance, they might discuss the assumption that "human traits are stable and can be measured" by citing personality assessments like the Big Five traits, which purportedly capture enduring characteristics. Alternatively, they could challenge this assumption by highlighting instances where traits fluctuate over time, such as changes in personality due to life experiences or interventions like therapy. Similarly, they could examine the assumption that "tests can be designed to measure human traits" by discussing standardized tests like the SAT or GRE, which aim to quantify cognitive abilities. Conversely, they might question this assumption by pointing out the limitations of test design in capturing the complexity and variability of human traits. Overall, exploring examples that both support and challenge these assumptions can foster critical thinking about the nature and implications of testing and assessment practices. 2. Students have all had ample experience in the role of test-taker. However, before taking this course in measurement, they probably have not given much thought to the question of whether the tests they were taking were norm- or criterion-referenced. Stimulate a discussion in this area by compiling a list on the board of the various kinds of tests students have taken. Then, discuss whether each one was norm- or criterion-referenced. Here are some follow-up discussion questions: –How were the results of each test used and how appropriate was the use? –In the case of criterion-referenced tests, who set the criterion and how? –In the case of norm-referenced tests, what would be the consequences of using a criterion-referenced approach? –In the case of criterion-referenced tests, what would be the consequences of using a norm-referenced approach? –Can tests be both norm-referenced and criterion-referenced at the same time? Answer: The exercise prompts students to reflect on their past experiences with tests, categorizing them as norm- or criterion-referenced and discussing the implications of each. They explore how the results of each test were utilized and evaluate the appropriateness of their use in different contexts. For criterion-referenced tests, discussions center on who sets the criteria and how, while for norm-referenced tests, consequences of adopting a criterion-referenced approach are considered. Similarly, consequences of using a norm-referenced approach for criterion-referenced tests are analyzed. The discussion may highlight the potential overlap between norm- and criterion-referenced testing and whether tests can exhibit characteristics of both simultaneously, prompting a deeper understanding of assessment methodologies and their applications. 3. Either as an independent question or as a follow-up to the discussion topic above, stimulate class discussion on the subject of norm- versus criterion-referenced scoring by asking students to select which type of measurement they believe would be most appropriate for the following: (a) admission to the university honors program (b) state licensing/registration as a medical doctor, barber, psychologist, massage therapist, and/or other professions and vocations currently subject to licensing (or not subject to licensing) in your state. (c) minimal competency exam for graduation from high school (d) program planning for remedial instruction in reading (e) grade in your psychological measurement course Answer: When discussing norm- versus criterion-referenced scoring for admission to the university honors program, students might debate the relevance of each approach. Norm-referenced scoring could be seen as beneficial for identifying the top performers relative to their peers, ensuring a competitive selection process. Conversely, criterion-referenced scoring might prioritize specific academic or extracurricular achievements, aligning more closely with the program's values and objectives. Students may consider factors such as diversity, equity, and the desired characteristics of successful candidates in determining which scoring method best serves the goals of the honors program. 4. Most students have had experience as a research subject. Stimulate discussion on the topic of convenience samples by asking students to recount their experiences. How appropriate was it for the researcher to use a convenience sample in the research that they participated in? The latter question may naturally lead in to a discussion of other types of samples and sampling in general. Answer: Encouraging students to reflect on their experiences as research subjects can spark a nuanced discussion on convenience sampling. They can share whether they felt adequately represented or if they perceive any biases in the findings due to the sampling method. This can segue into a broader conversation about the appropriateness of convenience sampling in different research contexts, considering factors like feasibility, resources, and research goals. Comparisons with other sampling methods, such as random sampling or stratified sampling, can highlight the strengths and limitations of each approach. Through this dialogue, students can deepen their understanding of sampling techniques and critically evaluate research design and methodology. 5. Introduce the concepts of raw score and cut-score (cut-off score or cut-off) for the quiz or test with reference to data from a quiz or test recently administered to the class. Explain how you determined the cut-score for different letter grades and introduce other methods that could have been used. Stimulate class discussion on this topic, including class discussion on (a) ways to improve the cut-score determination process, (b) what the meaning of a raw score of zero on the quiz might be, and (c) whether class quizzes should be norm- or criterion-referenced. Answer: In our recent class quiz, we utilized the concepts of raw score and cut-score to evaluate student performance. The raw score represents the number of correct answers obtained by each student, while the cut-score, or cut-off score, denotes the minimum score required to achieve a particular grade. To determine the cut-score for different letter grades, we analyzed the distribution of raw scores and set thresholds based on predefined criteria. For example, we might have established a cut-score of 80% for an A grade, 70% for a B grade, and so on. Alternative methods for determining cut-scores include the Angoff method, in which experts predict the performance of minimally competent students, or the Contrasting Groups method, which compares the performance of students with known characteristics. Stimulating class discussion on this topic could involve exploring ways to improve the cut-score determination process, such as incorporating student feedback or refining grading criteria. A raw score of zero on the quiz might indicate various factors, such as lack of preparation, misunderstanding of the material, or technical issues during the assessment. The debate over whether class quizzes should be norm- or criterion-referenced is an important one. Norm-referenced grading compares students' performance to that of their peers, potentially fostering competition and emphasizing relative standing. Criterion-referenced grading, on the other hand, evaluates students based on predetermined criteria or standards, focusing on mastery of specific content or skills. Encouraging class discussion on these issues can promote critical thinking and help students understand the implications of different assessment approaches in educational settings. IN-CLASS DEMONSTRATIONS 1. Bring something to class. Bring to class (a) the technical manual from an intelligence test Bring in the technical manual of a standardized intelligence test or personality test and discuss the process by which the norms were derived. Pass around the manual so that students can see firsthand the various ways that the norms are presented and discussed in the manual. (b) a sample test report Bring in an example of a standardized achievement test report (a non-personally-identifiable one) and make a copy for each of your students. Identify the types of standard scores that are used and how they enter into the interpretation of the results. Discuss how the norms are presented as well as alternative ways that they could have been presented. (c) a sample GRE score report As an alternative (or in addition) to a standardized achievement test report (as described above) bring in a copy of set of scores from the GRE. You should be able to obtain one from your department chair’s office or college admissions office. Make certain all identifying information is deleted before analyzing the results in class. You may wish to focus this discussion on the value of fixed reference groups and differences in interpretation using the fixed reference group (reflected in the actual scores) versus the more contemporary norm group (reflected in the percentiles). (d) a copy of the Standards Bring in the current edition of the Standards for Educational and Psychological Tests and read aloud selected portions of the material on “Scales, Norms, and Score Comparability.” This material can serve not only to supplement the discussion of these topics in the text, but stimulate class discussion regarding their applicability. (e) sample culture-specific tests Bring to class examples of “culture-specific tests” such as the Black Intelligence Test of Cultural Homogeneity (Williams, 1975) or the Cultural/Regional Upper-crust Savvy Test (Herlihy, 1977). Then, put these tests “to the test” with reference to this chapter’s Everyday Psychometrics. 2. Bring someone to class. Invite a guest speaker to class. The guest speaker could be: (a) a university staff member from the admissions office Invite a staff member involved with college or university admissions to speak on how scores on standardized tests are used as part of decision admissions. (b) a faculty member with expertise on the subject of norms Invite a faculty member (from your university or a neighbouring one) with expertise related to norms. (c) a local user of psychological tests Invite a local user of psychological tests from any setting with expertise and experience in using various types of normative data. IN-CLASS ROLE-PLAY AND DEBATE EXERCISES 1. Board of Directors Role Play Explore the subject of norms with a role-play exercise in which the class as a whole assumes the role of a trade school board charged with developing an entrance examination for an auto mechanics school. What type of norm group would be most appropriate for this test? What other considerations must be kept in mind when developing this entrance examination? 2. Debate: (Standard) Drinks for All? The Close-Up in this chapter presents a discussion designed to spur critical thinking with regard to the use of the term “standard” in measurement. Build on that discussion with a debate wherein students consider the pros and cons of consumer labeling of alcoholic beverages with “standard drink” content. The class is divided into three groups: Group 1 will take the “Pro Standard Drink” position, Group 2 will take the “Con Standard Drink” position, and Group 3 will act as the judging panel. Students in all of the groups will prepare for the debate by reading available material on the subject. For information on the “pro” side, students might consult Stockwell (1994) who argued that standard drink labeling would help alcohol drinkers better monitor their alcohol intake. For information on the “con” side, students might consult Lemmens (1994), who cited research suggesting that consumers fail to understand and act properly upon such labeling. OUT-OF-CLASS LEARNING EXPERIENCES 1. Take a field trip. Arrange a trip as a class to: (a) A corporate Human Resources office Arrange a visit to the human resources department of a local business or large corporation that employs normative data in the hiring of personnel so that students may see firsthand some “real life” application of norms. (b) The university library The university library to conduct a search on a psychology database (such as PSYLIT or ERIC) for examples of meta-analytic research. Print out the abstracts of some of the measurement-related studies for future discussion in class. Alternatively, the library trip could be used for the purpose of perusing test manuals from major standardized tests. Have students discuss the norming process described in each, and contrast the various ways that norms are presented in each. SUGGESTED ASSIGNMENTS 1. Critical Thinking Exercise: On Celebrating Diversity Write an essay on the pros and cons of “celebrating diversity.” 2. Generative Thinking Exercise: “Top 10 Traits Affected by Culture” Assign students the task of listing the “Top 10 Psychological Traits that are Affected by Culture.” The list should include 10 psychological traits going in order from #10 (least affected by culture) to #1 (most affected by culture). Accompany mention of each trait with a brief explanation as to why each of the trait was chosen. 3. Research-then-Discuss Exercise Group Discussion: Culture and One Particular Trait The class agrees on one particular psychological trait that everyone in the class will research. Then, the class is broken up into a half-dozen or so small groups, and each one of these small groups researches that trait and how it manifests itself in one particular culture. What follows next is a whole class discussion about that trait, with each of the groups contributing information regarding the trait vis-a-vis the culture they researched. Special attention should be paid to the instruments used to measure the trait and how the measurement of the trait may differ across cultures. 4. Research-then-Report Exercises (a) Comparing Norms for Two Tests Students are assigned the task of contrasting the approach to norming as described in two nationally standardized tests. What were the differences in the demographics employed? How did the tests differ in terms of how the norms were presented in the test manuals? What other differences exist between the two tests? The two tests assigned could be similar (e.g., both intelligence tests) or different (e.g., one intelligence test, one personality test) in terms of the variable measured. (b) “Standardized Tests” in the Popular Media Supplement the Close-up feature on the subject of standardization in measurement by having students write a report on how terms like “standardized tests” and “standardized testing” have been used (and misused) in the popular media. The report should contain ample examples of articles published in the popular press. (c) The Statistical Basis of Claims Made About Popular Products Have students look through current magazines and newspapers to identify two products that have statistics-based claims made about them by their manufacturer. The students write or phone the manufacturer to obtain more information about the sample used and statistical techniques employed. Students write a report of their findings drawing on the concepts presented in this chapter (4) as well as the previous chapter (3) in the textbook. Application to the concepts presented in this chapter as well as the previous chapter (Chapter 3) are stressed. Instructors may wish to consult Beins (1985) for more information about the utility of this type of exercise. MEDIA RESOURCES On the Web http://www.youtube.com/watch?v=WkJlst6vDyY An organization that calls itself “Fair Test” aggressively opposes any sort of norm-referenced testing and almost all standardized tests. In 2008, they produced this 30-second video. The video is useful as a stimulus for critical thinking regarding the arguments presented, as well as a class debate on the value of standardized tests and norm-referenced tests. http://www.stat.sc.edu/~west/javahtml/Regression.html It’s better than a video game! Visit this interactive site and see how the regression line changes as new points representing coordinates (any you choose) are added. http://www.math.csusb.edu/faculty/stanton/m262/regress/index.html Another interactive regression line site. http://chiron.valdosta.edu/whuitt/col/measeval/crnmref.html A summary of the distinction between norm-referenced and criterion-referenced testing can be found in convenient chart form at this site. http://www.sportsci.org/resource/stats/precision.html Introductory material on the psychometric concepts of reliability, validity, and more. http://www.apa.org/science/faq-findtests.html APA on how to find information about psychological tests (“good” tests and otherwise). On DVD, VHS, CD, and Other Media Normal Distributions (2010, DVD, 27 minutes, INS) Illustrates the value of a normal distribution in the context of gathering and interpreting data. Producing Data: Sampling (2010, DVD, 27 minutes, INS) Supplements the discussion of norms in this chapter with a discussion of how a sample can provide insights into a larger population REFERENCES Beins, B. (1985) Teaching the relevance of statistics through consumer-oriented research. Teaching of Psychology, 12, 168–169. Lemmens, P. (1994). “Would standard drink labeling result in more accurate self-reports of alcohol consumption?” Reply. Addiction, 89(12) 1704–1706. Mitchell, M. L. (1996, August). Snapping sharks, maddening mind readers, and interactive images: Teaching correlation. Paper presented at the meeting of the American Psychological Association, Toronto, Canada. Stockwell, T. (1994). Would standard drink labeling result in more accurate self-reports of alcohol consumption? Addiction. 89(12) 1703–1704. Solution Manual for Psychological Testing and Assessment Ronald Jay Cohen, Mark E. Swerdlik, Edward D. Sturman 9780077649814, 9781259870507

Download
Close

Close