Chapter 8: Measurement LEARNING OBJECTIVES After studying this chapter, you should be able to: • Explain why measurement and assessment is important to staffing. • Describe patterns in data. • Understand correlation and regression and explain how each is used. • Define practical and statistical significance and explain why they are important. • Define reliability and validity and explain how they affect the evaluation of a measure. • Explain why standardization and objectivity are important in measurement. TAKEAWAY POINTS 1. Measurement is essential to making good hiring decisions. Improperly assessing and measuring candidates’ characteristics can lead to systematically hiring the wrong people, offending and losing good candidates, and even exposing your company to legal action. By contrast, properly assessing and measuring candidates’ characteristics can give your organization a competitive advantage. 2. Measures of central tendency such as the mean, median, mode, and measures of variability such as range, variance, and standard deviation are useful for describing distributions of scores. This information can be used to compute standard scores, which can tell you how any individual performed relative to others and which can be used to easily combine scores that have different means and standard deviations. 3. Correlation is the strength of a relationship between two variables. Multiple regression is a statistical technique based on correlation analysis. The technique identifies the ideal weights to assign each test so as to maximize the validity of a set of assessment methods; the analysis is based on each assessment method’s correlation with job success and the degree to which the assessment methods are inter-correlated. Correlation and regression analyses are used to evaluate how well an assessment method predicts job success and to evaluate the effectiveness of a firm’s overall staffing system. 4. Practical significance occurs when the correlation is large enough to be of value in a practical sense. Statistical significance is the degree to which the relationship is not likely due to sampling error. To be useful, a correlation needs to have both practical and statistical significance. 5. Reliability is how dependably or consistently a measure assesses a particular characteristic. Validity is how well a measure assesses a given construct and the degree to which you can make specific conclusions or predictions based on a measure’s scores. A measure must be reliable in order to be valid. In order to be useful, a measure must be both reliable and valid. 6. Standardization is the consistent administration and use of a measure. Objectivity is the amount of judgment or bias involved in scoring an assessment measure. Because they produce the most accurate measurements, it is best to use standardized, objective measures whenever possible. DISCUSSION QUESTIONS 1. What types of measures of job candidates are most likely to be high in terms of their reliability and validity? Does this make them more useful? Why or why not? Answer: Measures that have an established, standardized, and systematic approach for collecting data are more likely to be high in reliability. Measures that are more objective than subjective are more likely to be reliable. It is important to keep in mind that there are different types of reliability so some tests can have low internal consistency reliability if they measure a variety of different concepts yet still demonstrate high test-retest reliability and strong validity. So the type of reliability matters. Additionally, measures that contain more than one question, evaluator, or indicator of some characteristic are more likely to be reliable and valid. Measures that measure one concept and contain the content of what is being measured, and have been empirically connected to important outcomes or other similar measures, are more likely to be valid. Examples of reliable and valid tests include structured work samples, published tests of intelligence or the big five, and carefully constructed job knowledge tests. Measures are always more useful if they are reliable and valid. If they are not reliable, then you may obtain different results depending on other conditions so you can’t be sure if the score is meaningful or useful. Even if they are reliable, a measure is of limited use if it is measuring the wrong thing or measuring something that doesn’t relate to desired outcomes. 2. How would you explain to your supervisor that the correlation between interview scores and new hire quality is low and persuade him or her to consider a new job applicant evaluation method? Answer: Several approaches are possible. One is to use data from current, new employees to create a scatter plot showing interview scores plotted against some measure of new hire quality. You can show where the cutoff score is and then count the number of people who would not have been selected, but who would have been good hires, and the number of people who would have been selected but who would have been bad hires. You can then compare this to another scatter plot representing a more reliable and valid technique with a stronger correlation. Another approach might be to count the number of successful new employees for each range of interview scores. If the correlation is weak, then the number of successful employees will not change much across each interview score range. If the correlation is positive and strong, then the number of successful employees will increase as the scores go up. This same approach can be used to demonstrate the increase in success rates for a more valid evaluation method. A third approach might be to explain it verbally using specific examples. Normally, the correlation between height and weight is over +.70 and the correlation between years with a firm and compensation level is often over +.70 or +.80. Data can be collected to show the correlation between interview scores and a measure of new hire quality and it is likely to be less than .25 (especially if it is based on unstructured interviews). Published data in test manuals may show validities over .35 or .40. 3. What correlation would you need to see before you were willing to use an expensive assessment test? Answer: Typically we would like to see validities above .20, and especially above .35. However, providing a single answer to this question is difficult because the answer should depend on the current success base rate in the organization, the selection ratio, the importance of the job in question, the actual cost of the assessment test, applicant reactions to the test, degree of adverse impact (if any), availability of alternative tests, and whether or not a curvilinear relationship might exist. Assessment tests with correlations as low as .10 or .15 can be strategically and financially useful even if the assessment is expensive if the base rate is low enough, if the selection ratio is low enough, the adverse impact is low, current employees show great variability in performance, the assessment exhibits great face validity, there are few or no alternative measures, and if improving selection yields significant financial gain to the organization. In any case, to be strategically useful and legally defendable, it should be statistically significant. Statistical significance depends on the actual magnitude of the correlation and the sample size used to assess it. 4. When would it be acceptable to use a measure that predicts job success, but that has adverse impact? Answer: In general it would be preferable to not use a measure with adverse impact. However, if a job analysis demonstrates that the measure taps into critical KSAOs, it is demonstrated that the measure is useful for predicting important job relevant outcomes, it is statistically significant, and there are few, or no alternative approaches available then it is likely to be both strategically useful and legally defensible. It is probably best to use the measure with some type of cutoff or banding approach to reduce the adverse impact as much as possible. Also, if other measures with less adverse impact are available, then it might be possible to combine the measure with a compensatory or multiple-hurdles process to reduce the overall adverse impact problem. 5. What do staffing professionals need to know about measurement? Answer: Staffing professionals need to understand the process of using data to make selection decisions. Creating a high-quality and talented workforce depends on the accurate selection of employees who will best fit the organization’s strategy, culture, and position requirements. Accurately selecting employees requires staffing professionals to collect, measure, and interpret data relevant to the firm’s strategic staffing efforts. Staffing professionals thus must be able to describe patterns in data using frequency distributions, measures of central tendency, measures of variability, and scatter plots. They must also understand and be able to interpret correlations or regressions and their associated statistical and practical significance. The ability to interpret data depends on the quality of the data itself and the methods used to collect it. Thus, staffing professionals must also understand the different types of reliability and validity that are used to evaluate the quality of the measures. They must be sensitive to the issue of face validity, and the effects it can have on applicant reactions. Finally, staffing professionals must understand that standardization and objectivity generally improves the measurement process by reducing contamination and deficiency error. EXERCISES 1. Strategy Exercise: Teddy-bear maker Fuzzy Hugs pursues a high-quality, low-cost strategy and can’t afford to hire underperforming manufacturing employees given its lean staffing model. Fuzzy Hugs has identified an assessment system that has high validity and predicts job success well but that is also very expensive and results in fairly high levels of adverse impact. The company is concerned about maintaining a diverse workforce, and wants to avoid legal trouble. The assessment tools it identified that had lower adverse impact had substantially lower validity as well, and were almost as expensive. The company asks your professional advice about whether it should use the new assessment system. What advice do you give? Answer: Because of the high cost and adverse impact, I would advise the organization to establish that using the system yields a meaningful gain to the organization. There is no point in using the assessment system, even if the validity is high, if the gains do not offset the cost, and potential legal liability. Next, I would go over the reliability and validity information to ensure that the use of the assessment system is strategically aligned with the goals of the company and legally defensible. If the reliability and validity information supports the use of the assessment system, then I would advise the organization to continue to use it and document how the reliability and validity estimates were made. I would also suggest three methods for reducing the adverse impact of the assessment system. First, I would advise Fuzzy Hugs to develop a sourcing and recruiting plan that would yield high quality male and female applicants who have a variety of ethnic backgrounds. If the applicants are diverse and of high quality then this is likely to reduce the adverse impact of the assessment system. Second, I would advise Fuzzy Hugs to consider using some of the other assessment tools with lower adverse impact to see if making it part of either a multiple hurdles or compensatory system is economically beneficial, while reducing the overall adverse impact. Third, I would advise Fuzzy Hugs to either set a low cutoff score and use other approaches for making a final decision (e.g., interviews) or use a banding approach to reduce adverse impact. 2. Develop Your Skills Exercise: This chapter’s Develop Your Skills feature gave you some tips on assessing job candidates. Based on what you read in this chapter, what are three additional tips that you would add to the list? (Additional exercises are available at the end of this chapter’s supplement that will enable you to build additional computational and decision-making skills when using data). Answer: Here are some possible tips to add: Different assessments can predict different outcomes (e.g., job performance, turnover, trainability, safe work days, and absenteeism) and it is unlikely that you will find a single assessment that is the best predictor for all possible indicators of job success. Consider the cost of measures being used and consider whether there are other less expensive assessment alternatives that have equal, or nearly equal predictive power. Collect data on applicant reactions to your assessment tools. Ensure that they appear face valid and fair. Determine if you have redundancy in the assessment tools by conducting a predictive validity study and applying multiple regression techniques. If you do not have in-house experts in assessment technologies, then make sure you hire such experts to oversee and validate your assessment process. Calculate and report measures of central tendency and variability and plot the distributions to ensure you have a normal distribution with appropriate amounts of variability. Under point 4, be attentive to the type of reliability and validity being reported when interpreting the information provided. Three additional tips for assessing job candidates include: (1) Use Behavioral Interviews to evaluate how candidates handled past situations, providing insights into their problem-solving and interpersonal skills. (2) Implement Skills Assessments to objectively measure candidates' abilities relevant to the job. (3) Conduct Reference Checks to validate candidate claims and gain perspective on their work ethic and performance from previous employers. 3. Opening Vignette Exercise: The opening vignette described how how Xerox developed an assessment system to improve the performance and retention of its call center workers. Reread the vignette and answer the following questions: a. What other assessments do you think could be considered for the job given the company’s high-service quality goal? Why? b. If you applied to this company were denied a job due to your personality assessment results, how would you feel? Would you think that these methods were fair? Answer: a. Some possible measures include biodata (such as experience working with customers), customer orientation, conscientiousness, extroversion, sociability, intelligence, emotional intelligence, and team orientation. Biodata may help to identify candidates who have experience dealing with customers and who have a preference for work environments in which they have to interact with customers. Customer orientation refers to a desire and willingness to work with customers and provide high quality interactions. Most likely a new measure would have to be devised and validated for such a predictor. Conscientiousness should predict dependability, motivation and diligence on the job, and commitment. Extroversion and sociability might predict willingness to interact and communicate with others. Some research has established that extroversion can predict sales performance to some degree. Intelligence may be a useful predictor because it assesses information processing capability. Candidates who can process information more rapidly should be able to learn the job more quickly, read and understand customer’s needs more effectively, and acquire job relevant knowledge more rapidly. Emotional intelligence may be a useful predictor if it can be demonstrated that the measure is reliable and valid. If so, then emotional intelligence might predict the ability to regulate one’s own emotions, and also read and respond to the emotional signals of customers. Both may be useful predictors of customer interactions in a retail environment. Finally, if the client wants employees to help each other and work as a unit to provide high quality service, then candidates with an orientation toward working in teams should work better with other employees. Other possible measures could also include technical skills (e.g., counting money, operating registers and computers, etc.), ability to learn important information about products (e.g., trainability), ability to handle stressful circumstances, honesty or integrity, and self-monitoring skills. b. In general, most people would not respond positively to not getting a job due to scores on personality measures. It would depend on what the specific measures are, and how they are presented (interactional and procedural fairness). For example, a personality measure that asked if a person liked going out, or dating, might be considered more invasive and less face valid than a personality measure that asked if a person liked to work with customers, especially if there is no explanation as to why these types of questions are relevant. Also, the perceived fairness might be driven by whether or not a person actually received the job offer (distributive fairness). ADDITIONAL EXERCISE Measurement, Reliability, Validity Thank you to Professor Barbara Rau of the University of Wisconsin-Oshkosh for providing this exercise. Using the following data for this assignment: 1. Calculate the mean of the errors associated with Test 1. Do your calculations by hand and show your work (i.e., show me the formula and what you have plugged in at each step to derive your answer). Then enter the data in SPSS or Excel and create the output that verifies your calculations (attach the output to this assignment): Answer: True Score = Actual Score + Error, therefore Error = True Score – Actual Score Calculated error for each employee: 87 – 83 = +4 95 – 97 = -2 72 – 80 = -8 73 – 75 = -2 84 – 80 = +4 Mean error = (4 – 2 – 8 – 2 + 4)/5 = -.80 2. Calculate the standard deviation of the errors associated with Test 1. Do your calculations by hand and show your work (i.e., show me the formula and what you have plugged in at each step to derive your answer). Then generate the answers using SPSS or Excel and attach the output that verifies your calculations: Answer: Standard deviation of error = Square root[(error – mean error) 2 / (n-1)] Standard deviation = Square root (100.80/4) = 5.02 3. Use SPSS or Excel (see Chapter Supplement for computing correlation using Excel) to calculate the alternate forms reliability of Test 1 as measured by its association with scores on Test 2 – Attach your output. Answer: rxx = .574 4. Estimate the standard error of the measurement associated with test 1 using the reliability estimated in ‘3.’ above. Why doesn’t the estimate here equal the standard deviation of the errors calculated in ‘2.’ above? Answer: SEM = 8.34√1-.574= 5.44 The first one estimates the standard error using the standard deviation of the errors. However, in reality we could never know this since we do not know the true scores. Therefore, we must estimate the standard error using our estimate of the reliability of the test which is what we did in Question 3. The better our estimate of reliability (i.e., the more accurate the estimate), the closer SEM will be to the actual standard deviation of the errors. 5. Assume that a new job candidate scores 87 on Test 1. Using the standard error of measurement estimated in Question 3 above, identify the 95% confidence interval within which her true score is likely to lie. Answer: There is a 95% chance that the person’s true score lies within + or – (1.96)(SEM) of her actual test score. There is a 95% chance that the person’s true score lies within the following interval: 76.34 ≤ True Score ≤ 97.66 6. Assume that a new job candidate scores 87 on Test 1. Using the standard error of measurement estimated in Question 3 above, identify the 99% confidence interval within which her true score is likely to lie. Answer: There is a 99% chance that the person’s true score lies within + or – (3)(SEM) of her actual test score. There is a 99% chance that the person’s true score lies within the following interval: 70.68 ≤ True Score ≤ 100 (Note: scores cannot exceed 100 on this test). 7. What is the probability that his true score is greater than 96? Answer: z=96-87/SEM = 9/5.44 =1.65. There is only a 5% chance that his true score could be greater than 96. 8. Looking at the Test 1 scores, are there any candidates that you can confidently conclude are either more, or less qualified than the others (using a 95% confidence interval)? Answer: You can conclude that Lonny Voss is less qualified than Teri Rainboth. 9. If the scale for Test 2 is changed so that it is measured on a scale from 0 to 100 like Test 1, what impact would this have on the estimated alternate forms reliability? Answer: None. Scale changes (addition/subtraction/multiplication/division by a constant) have no impact on the relationship between two variables. The correlation uses standardized scores in the calculation and is therefore impervious to the scale. 10. Using SPSS or Excel (see Chapter Supplement for computing correlation using Excel), what is the criterion-related validity associated with Test 1? Attach your output here. Answer: rxy = .79 CASE STUDY You just became the head of staffing for BabyBots, a manufacturer of small robots. You were surprised to learn that the company had never validated the manual dexterity test it uses to assess job candidates for its manufacturing jobs. You decided to do a concurrent validation study and administer the test to thirty manufacturing workers. Their scores are reported in Table 8-4, along with their ages, sex, race, and job performance ratings. You also calculated the correlation between the manual dexterity test and job performance to assess the test’s validity. You then examined the relationship between employees’ test scores and their performance ratings. The results of this analysis are shown in Tables 8-5 and 8-6. [[The data file required for this exercise is located at: http://www.pearsonhighered.com/phillips]] Questions: 1. What kind of relationship exists between employees’ scores on the manual dexterity test and their performance ratings? Answer: The correlation of .86 between the test and job performance is exceptionally strong. In fact, this is stronger than we are typically likely to find in real staffing contexts. Typically correlations are more likely to be around .3 or .4. Given the strength of the positive correlation, and the fact that it is significant with a p < .05, this is very unlikely to be a chance relationship. In other words, this correlation is unlikely to be due to sampling error, and represents a statistically meaningful relationship. It is both statistically and practically meaningful in significance. 2. Suppose a candidate scored 44 on the manual dexterity test. The regression equation predicting job performance using the manual dexterity test is: 32.465 + (1.234 × Manual dexterity test score) What is the candidate’s predicted job performance? Answer: 32.465 + (1.234 * 44) = 86.76 The candidate’s predicted job performance is 86.8. 3. Assume that only candidates with predicted performance above 85 are to be hired. This translates to a score of at least 43 on the manual dexterity test. Assume only those with scores above 43 were hired (20 of the 30 people in this sample). Would the use of this test have led to evidence of adverse impact based on sex or race? The relevant data on the 20 people exceeding the cutoff are above in Table 8-6. Answer: The hiring rate for females is 81.25% (the highest proportion). According to the 4/5ths or 80% rule, the hiring rate for males should be at least .8 * .8125 = .65 or 65%. The hiring rate for males is 50%, which is less than 69%. Thus, there is evidence for adverse impact against males because men were hired at a proportionally lower rate. For race, the highest rate is for whites at 72.73%. According to the 4/5ths rule, the hiring rate for other groups should be at least .8 * .7273 = .5818 or 58.18%. Hispanics and blacks were hired at a rate higher than 58.2%, so there is no evidence for adverse impact by race. 4. Given the validity results you found, would you recommend use of this test as a selection device? If so, how would you use it? Answer: The evidence for adverse impact for gender creates a problem with using the selection test, but the validity evidence is exceptionally strong. If a thorough job analysis had been performed, and the KSAOs measured by the test in question are clearly required as a business necessity, and if there are no equally effective assessment alternatives with less adverse impact, then I would feel comfortable using the test. However, I would probably use it as a screening device and lower the cutoff score, or use banding in my final decision making to reduce the adverse impact, even though that reduces some of the benefits of using the test. SEMESTER-LONG ACTIVE LEARNING PROJECT Finish the assignment for Chapter 7 and begin researching, describing, and critically analyzing the alignment between the position you chose and the firm’s existing assessment practices. Devise a series of assessment methods (interviews, assessment centers, work samples, and so forth) for evaluating job candidates. Using what you learned in Chapter 4, identify how your assessment plan will enable the company to be compliant with EEO laws and other legal requirements. SUPPLEMENT Attenuation Due to Unreliability 1. Assume you have conducted a concurrent criterion validity study to evaluate the usefulness of using a mathematical reasoning test to predict the performance of software engineers. You used 42 current employees for your study by administering the test to your employees, and then collecting performance data on those same employees. The mathematical reasoning test uses a variety of questions to assess the ability to learn, understand, and apply mathematical and statistical concepts in a variety of situations. You have a measure of internal consistency reliability for this test. The performance of the 42 employees was measured by having two different managers rate the performance of the employee on the basis of quality, timeliness, and productivity. You have a measure of inter-rater reliability based on the correlations between managerial ratings. a. What impact might unreliability in your test or performance measures have on the results of your validity study? b. Assume your observed correlation between test scores and performance is .18. How would you characterize this correlation? c. Assume the internal consistency reliability of your mathematical reasoning test is .64. Also assume that the correlation between managerial ratings of performance is .49. What is the correlation after correction for attenuation? d. What are your conclusions based on this analysis? Answer: a. To the degree that the test or performance measures have low reliability, the observed correlation could be attenuated, or weakened. b. It appears weak. The students don’t have the formula to compute the following information, but it may be helpful to know that a t-test of this correlation yields a value of t=1.16, which is below the significant cutoff for both one-tailed and two-tailed tests. Thus, it is not statistically significant. c. The formula is: Square root of .64 is .8 and square root of .49 is .7. Corrected rxy = .18 / (.8 * .7) Corrected rxy = .32. d. Mathematical reasoning does appear to be moderately related to the performance of software engineers. However, the relatively low reliability of the test, and of the performance ratings potentially masked the strength of the true relationship. Attenuation Due to Range Restriction 2. Assume you believe that conscientiousness is a good predictor of employee performance in a manufacturing firm. Previously, you collected data for a predictive criterion validity study to evaluate the usefulness of using interview scores assessing conscientiousness to predict the performance of manufacturing employees who were later hired on the basis of the interview scores. You hired 65 new employees using the interviews, and 6 months later you had 53 employees remaining. You collected job performance on the 53 remaining employees using measures of quality and parts per hour. You are surprised to see a low observed correlation. a. What impact might range restriction have on the results of your validity study? b. Assume your observed correlation between test scores and performance is .14. How would you characterize this correlation? Answer: a. The interview scores were directly used to make hiring decisions, so it is likely that range restriction exists on the interview measure of conscientiousness. Also, depending on who left the firm during the following 6 months, you might have range restriction on performance as well. b. It appears weak. The students don’t have the formula to compute the following information, but it may be helpful to know that a t-test of this correlation, with an n of 53, yields a value of t=1.01, which is below the significant cutoff for both one-tailed and two-tailed tests. Thus, it is not statistically significant. c. Assume the unrestricted standard deviation of the interview scores is 3.5 on a 20 point scale. Assume the standard deviation of interview scores after hiring is 1.5. Given the observed correlation of .14, compute the estimated correlation for the full range of scores. What are your conclusions? The formula is: rxy = .14 rxy2 = .0196 SU = 3.5 SR = 1.5 Numerator is .14 * (3.5/1.5) = .3267 Denominator is Square Root of (1-.0196+.0196*(3.52 / 1.52) = Square Root of (.9804 + .0196 * 5.444) = Square Root of (1.08711) = 1.042646 Corrected rxy =.3267 / 1.042646 Corrected rxy =.3133 The low initial observed correlation is due to direct range restriction resulting from the use of the interview scores to hire employees. Supplementary Computations You can access the data for this section online at the Pearson website (UniversalToys3E.xls) at: http://www.pearsonhighered.com/phillips]]. You work at a manufacturing plant called Universal Toys and you were just put in charge of a project to evaluate the validity of the current selection system. Your job is to evaluate the usefulness of two conscientiousness measures using the same test taken 2 weeks apart, a cognitive ability test, and two interview scores. The conscientiousness measure is a standard personality measure for work environments with scores ranging from 1 to 10 (ConscT1 = time 1, ConscT2 = time 2, two weeks later). The cognitive ability measure is a standard measure of intelligence, with typical scores ranging from 80 to 130 (CogAbil). The first interview score is from an unstructured interview with the hiring manager and it ranges from 1 to 7 (HiringMgr). The second interview score is from a structured interview with the HR manager assessing key capabilities and experience and it ranges from 1 to 7 (HR Mgr). All of these assesssments have been used for the past 3 years to hire employees and you have data on 48 current manufacturing employees with their scores on the predictors recorded in your Human Resource Information System (HRIS). Scores for these 48 employees are reported in the spreadsheet, along with their age, sex, race, position code, and managerial job performance ratings. You are also asked to conduct an adverse impact analysis. Race is coded as “0” for white, “1” for African-American, and “2” for Hispanic/Latino. Males are coded as “0”, females as “1”. Position 1 refers to assembly, 2 to equipment set up, and 3 to quality control. Performance is measured as a rating from the Team Production Leader and it ranges from 10 to 50. Attach printouts or cut and paste into a document and perform the following analyses on the data: 1. Using Excel or counting by hand, plot the distribution of HR manager interview scores for employees numbered 1 through 16 with scores on the X axis and frequency of scores on the Y axis. Describe the pattern. Answer: 2. By hand calculate the mean, median, mode, range, variance, and standard deviation for the cognitive ability test for employees numbered 1 through 16. If you wish, you can repeat this for practice on numbers 33-48. Employees Numbered 1-16 Emp ID Cog Abil Deviation DevSqrd 1 95 0 0 2 85 -10 100 3 85 -10 100 4 95 0 0 5 90 -5 25 6 110 15 225 7 80 -15 225 8 90 -5 25 9 100 5 25 10 105 10 100 11 95 0 0 12 85 -10 100 13 90 -5 25 14 105 10 100 15 100 5 25 16 110 15 225 Sum 1520 1300 Sum of Dev Squared n 16 15 n-1 Mean 95 86.66667 Variance Mean 95 Median 95 Mode 95 Range 30 Variance 86.66667 StDev 9.309493 Answer: Employees Numbered 33-48 Emp ID Cog Abil Deviation DevSqrd 33 95 -9.0625 82.12891 34 100 -4.0625 16.50391 35 115 10.9375 119.6289 36 100 -4.0625 16.50391 37 105 0.9375 0.878906 38 95 -9.0625 82.12891 39 130 25.9375 672.7539 40 90 -14.0625 197.7539 41 115 10.9375 119.6289 42 90 -14.0625 197.7539 43 90 -14.0625 197.7539 44 110 5.9375 35.25391 45 100 -4.0625 16.50391 46 95 -9.0625 82.12891 47 110 5.9375 35.25391 48 125 20.9375 438.3789 Sum 1665 2310.938 Sum of Dev Squared n 16 15 n-1 Mean 104.0625 154.0625 Variance Mean 104.0625 Median 100 Mode 95 Range 40 Variance 154.0625 StDev 12.41219 3. Using Excel, repeat the calculations you just did by hand to check your work. Answer: See Universal Toys Instructor Excel spreadsheet tab CogAbility1-16: Formula: Sum: @sum(c2:c17) Mean: @average(c2:c17) Median: @median(c2:c17) Mode: @mode(c2:c17) Range: @max(c2:c17)-@min(c2:c17) Variance: @var(c2:c17) Standard Deviation: @stdev(c2:c17) 4. Using Excel, calculate the mean, median, mode, variance, and standard deviation for all interval or ratio level variables, including performance, across all employees (1 through 48). Answer: See bottom of Universal Toys Instructor Excel spreadsheet tab Data Consc T1 Consc T2 Cog Abil Hiring Mgr HR Mgr Age Performance Mean 5.0625 4.979167 99.0625 4.8125 4 42.10417 28.75 Median 5 5 97.5 5 4 43 30 Mode 5 5 90 7 4 36 30 Range 9 9 50 6 6 47 40 Variance 5.04 5.00 129.42 4.07 2.98 97.29 101.60 StdDev 2.24 2.24 11.38 2.02 1.73 9.86 10.08 Min 1 1 80 1 1 18 10 Max 10 10 130 7 7 65 50 5. By hand create a scatterplot of cognitive ability and performance for employees numbered 33 through 48. By hand compute a correlation of these same data points. Confirm your findings using Excel. Answer: See Universal Toys Instructor Excel spreadsheet tab Hand Compute Emp ID Cog Abil Performance XY DevX DevY DevXSqrd DevYSqrd 33 95 20 1900 -9.06 -12.19 82.13 148.54 34 100 30 3000 -4.06 -2.19 16.50 4.79 35 115 40 4600 10.94 7.81 119.63 61.04 36 100 30 3000 -4.06 -2.19 16.50 4.79 37 105 40 4200 0.94 7.81 0.88 61.04 38 95 30 2850 -9.06 -2.19 82.13 4.79 39 130 45 5850 25.94 12.81 672.75 164.16 40 90 35 3150 -14.06 2.81 197.75 7.91 41 115 35 4025 10.94 2.81 119.63 7.91 42 90 20 1800 -14.06 -12.19 197.75 148.54 43 90 30 2700 -14.06 -2.19 197.75 4.79 44 110 25 2750 5.94 -7.19 35.25 51.66 45 100 30 3000 -4.06 -2.19 16.50 4.79 46 95 25 2375 -9.06 -7.19 82.13 51.66 47 110 30 3300 5.94 -2.19 35.25 4.79 48 125 50 6250 20.94 17.81 438.38 317.29 Sum 1665 515 54750 2310.94 1048.44 n 16 16 16 n 16 16 Mean 104.0625 32.1875 n-1 15 15 (SumX*SumY)/n 53592.19 Var X 154.06 Numerator 1157.81 Var Y 69.90 Denominator 15.00 Covariance 77.19 StdDev X 12.41 StdDev Y 8.36 Excel Covariance 72.36 Excel Covariance divides through by n, not n-1 Corrected Excel 77.19 Correl 0.74 Variance X 154.06 Variance Y 69.90 StDev X 12.41 StDev Y 8.36 Correl 0.74 6. By hand compute a simple regression of the relationship between cognitive ability and performance for employees numbered 33 through 48. Use Excel to confirm the slope and intercept. Compute a predicted score for someone with an ability score of 118. Answer: b = cov / varx = 77.19 / 154.06 = .501 a = mean Y – b*Mean X = 32.1875 - .501*104.0625 = -19.95 y/ = -19.95 * .501(Cog Ability) y/ = -19.95 * .501(118) y/ = 39.17 7. Use the correlation function to correlate the two sets of conscientiousness scores. What does this correlation tell you? Answer: r = .81 This is the test-retest reliability of the conscientiousness measure. It shows that the reliability is fairly high and that scores are fairly stable over time. Use the correlation function to correlate each of the conscientiousness scores with performance. With a sample size of 48 and a p-value of .05, any correlation over .285 (ignoring the sign) is statistically significant. What did you find? The first conscientiousness score is correlated .52 and the second conscientiousness score is correlated .49 with performance. Both are significantly related to performance. 8. Now use the Regression function in the Data Analysis option to predict job performance using the two conscientiousness scores. What did you find? Why? SUMMARY OUTPUT Regression Statistics Multiple R 0.532 R Square 0.283 Adjusted R Square 0.252 Standard Error 8.720 Observations 48.000 ANOVA df SS MS F Significance F Regression 2.000 1353.146 676.573 8.897 0.001 Residual 45.000 3421.854 76.041 Total 47.000 4775.000 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 16.142 3.254 4.961 0.000 9.588 22.695 Consc T1 1.663 0.957 1.739 0.089 -0.263 3.590 Consc T2 0.841 0.960 0.876 0.386 -1.094 2.775 Answer: The two conscientiousness measures significantly predict performance together, F(2,45)=8.897, p < .01. However, neither time 1 nor time 2 conscientiousness measures, predict performance beyond the other. This is because they are highly correlated and both measure essentially the same thing. This means that they predict nearly identical and overlapping variance in performance, with neither predicting unique variance beyond the other. 9. Insert a column after the two conscientiousness scores. Compute an average score for conscientiousness using the two conscientiousness scores. Correlate average conscientiousness score with job performance. Assume job performance has a reliability of .90. Use the reliability of the conscientiousness measure and the reliability of job performance to correct the observed correlation for attenuation. Answer: The correlation between average conscientiousness and performance is .529. Square root of .81 = .90 and square root of .90 = .949. Corrected correlation = .529 / (.9*.949) = .620 10. Use the correlation function to correlate the two sets of interview scores. What is the value and what does this correlation tell you? Why do you think you observed this value? Answer: The correlation between the two sets of interview scores is .02, a very low value. This is an estimate of the inter-rater reliability. It essentially means that the hiring manager and HR manager were not consistent at all. It is possible that each person was looking for different things in the candidate. Or, it could be because the hiring manager used an unstructured approach, which is known to have low reliability, while the HR manager used a structured approach to interviewing. 11. Use the Regression function in the Data Analysis option to predict job performance using the two interview scores. What did you find and what would you recommend? SUMMARY OUTPUT Regression Statistics Multiple R 0.573 R Square 0.328 Adjusted R Square 0.298 Standard Error 8.443 Observations 48.000 ANOVA df SS MS F Significance F Regression 2.000 1567.389 783.695 10.995 0.000 Residual 45.000 3207.611 71.280 Total 47.000 4775.000 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 17.101 4.238 4.035 0.000 8.566 25.636 Hiring Mgr -0.346 0.611 -0.567 0.573 -1.576 0.883 HR Mgr 3.329 0.714 4.664 0.000 1.891 4.766 Answer: Only the HR manager interview scores predicted performance. I would recommend using only the HR manager scores. Given these findings, I would also recommend having the hiring manager receive training and tools to implement and use a structured interview approach for all interviews. 12. Use the correlation function to correlate ability with job performance. Correct this correlation for range restriction, assuming the unrestricted population has a standard deviation of 15 for ability. Answer: Cognitive ability is correlated .449 with performance. Restricted standard deviation = 11.38 Unrestricted standard deviation = 15 See Universal Toys Instructor Excel spreadsheet tab Correction Range Restriction: Numerator: .5918 Denominator: 1.0718 Corrected correlation: .552 13. Use the Regression function in the Data Analysis option to predict job performance using the average conscientiousness measure you created, cognitive ability, and interview scores from the hiring and HR managers. Look at the p-value for each predictor. Any value less than .05 is significant. What do you see and what would you conclude? Are there any variables you can drop? Why or why not? SUMMARY OUTPUT Regression Statistics Multiple R 0.694 R Square 0.482 Adjusted R Square 0.433 Standard Error 7.587 Observations 48.000 ANOVA Df SS MS F Significance F Regression 4.000 2299.789 574.947 9.988 0.000 Residual 43.000 2475.211 57.563 Total 47.000 4775.000 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept -11.888 9.884 -1.203 0.236 -31.821 8.045 AvgConsc 1.408 0.637 2.210 0.032 0.123 2.693 Cog Abil 0.286 0.106 2.704 0.010 0.073 0.500 Hiring Mgr -0.405 0.583 -0.695 0.491 -1.582 0.771 HR Mgr 1.792 0.796 2.251 0.030 0.186 3.397 Answer: I would drop the hiring manager interview scores because they are uncorrelated with the HR manager interview scores, and they don’t predict performance at all. 14. Write out the multiple regression equation from your results including all variables. What is the predicted score for someone with scores of 6, 120, 5, and 7 for average conscientiousness, cognitive ability, hiring manager interview score, and HR manager interview score, respectively? Answer: Predicted Performance = -11.89 + (1.408 * AvgConsc) + (.286 * Cog Abil) + (-.405 * Hiring Mgr) + (1.792 * HR Mgr) Predicted Performance = -11.89 + (1.408 * 6) + (.286 * 120) + (-.405 * 5) + (1.792 * 7) Predicted Performance = 41.42 15. What do you conclude overall about the validity of the current system? Write a short paragraph. Rerun the regression analysis dropping variables if you deem it appropriate to do so. Write out the final multiple regression equation you would recommend using. Answer: Overall, the measures used to predict employee performance exhibit high validity. The test-retest reliability of the conscientiousness scores is high, and neither measure predicted beyond the other. The average of the two conscientiousness measures exhibited a significant relationship with performance. The inter-rater reliability of the two interview measures is low. Of the two interview measures, only the structured interview scores from the HR manager exhibited statistically significant relationships with performance. It is proposed that the hiring manager scores should be dropped, and the average of the two conscientiousness measures should be used in place of both or either one alone. The final regression analysis yields the following results: SUMMARY OUTPUT Regression Statistics Multiple R 0.690 R Square 0.476 Adjusted R Square 0.440 Standard Error 7.542 Observations 48 ANOVA df SS MS F Significance F Regression 3 2271.997 757.332 13.313 0.000 Residual 44 2503.003 56.886 Total 47 4775.000 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept -12.446 9.793 -1.271 0.210 -32.183 7.290 AvgConsc 1.518 0.613 2.475 0.017 0.282 2.755 Cog Abil 0.268 0.102 2.629 0.012 0.063 0.474 HR Mgr 1.747 0.789 2.216 0.032 0.158 3.337 All three predictors exhibit significant relationships with performance, and the three together significantly predict performance (F(3, 44) = 13.313, p < .01). The final prediction equation to be used is: Predicted performance = -12.446 + (1.518*Avg Consc) + (.268*CogAbil) + (1.747*HR Mgr) In the future, if it is cost prohibitive to administer two conscientiousness measures, then due to the high reliability, one of the times can be dropped with little effect on predictive accuracy. 16. Look at the Applicant Data in the Applicant tab of the Excel spreadsheet. There are 50 applicants. Use Excel and your multiple regression equation to compute predicted scores on all of the applicants. Assume you wanted to hire only the top 50% of these applicants based on their predicted performance. Evaluate the implications of hiring only the top 50% of applicants for adverse impact by sex and race. Write a short paragraph and include computations for adverse impact. Answer: See Universal Toys Instructor Excel spreadsheet tab Sorted Applicants. Hiring the top 50% of applicants based on predicted performance from the final equation in the preceding answer yields the following results: Value Hired Total Rate Males 0 10 23 0.434783 Females 1 15 27 0.555556 50 4/5th 0.444444 Value Hired Total White 0 11 22 0.5 African-American 1 6 18 0.333333 Hispanic/Latino 2 8 10 0.8 50 4/5th 0.64 Of 50 applicants, I would hire 10 males and 15 females. There are 23 males and 27 females in the 50. This yields a hiring rate of 10/23 = 43.5% for males and 15/27 = 55.6% for females. Using the 4/5ths rule, males would have to be hired at least 44.4% of the time (.8*.556). The current hiring rate for males is slightly below that rate, indicating the possibility of adverse impact. Of 50 applicants, I would hire 11 white, 6 African-American, and 8 Latino/Hispanic employees. There are 22 whites, 18 African-Americans, and 10 Latino/Hispanic applicants. This yields hiring rates of 11/22 = 50.0%, 6/18 = 33.3%, and 8/10 = 80% for whites, African-Americans, and Latino/Hispanic employees, respectively. Using the 4/5ths rule, all groups would have to be hired at least 64.0% of the time (.8*.8). The current hiring rate for whites is below that rate, and the hiring rate for African-Americans is substantially below that rate, strongly supporting the notion that adverse impact has taken place. 17. Given the validity and adverse impact analysis results you found, what would you recommend? Is there adverse impact? Is this a legally defensible staffing system? Do the assessments have reliability and validity? Answer: Adverse impact does seem to exist but the staffing system exhibits high predictive validity, once the hiring manager interview scores are omitted. The remaining assessments seem to have high reliability and reasonable content validity. The findings are consistent with a large body of research suggesting that cognitive ability, conscientiousness, and structured interview scores can predict job performance. It seems that the current system is likely to hold up in a court of law. However, the high levels of adverse impact, particularly against African-Americans, suggest that alternative predictors should be considered. It is worth looking into the source of the adverse impact and then trying to identify alternative predictors that exhibit equally high levels of prediction while reducing adverse impact. Also, if a particular valid predictor is identified that generates the most adverse impact, then perhaps it can be used as an initial hurdle with subsequent predictors used to generate finalists. A final option might be to use a banding approach. Solution Manual for Strategic Staffing Jean M. Phillips, Stan M. Gully 9780133571769
Close