CHAPTER TEN PREPARING DATA FOR QUANTITATIVE ANALYSIS LEARNING OBJECTIVES (PPT slide 10-2) 1. Describe the process for data preparation and analysis. 2. Discuss validation, editing, and coding of survey data. 3. Explain data entry procedures and how to detect errors. 4. Describe data tabulation and analysis approaches. KEY TERMS AND CONCEPTS 1. Coding 2. Cross-tabulation 3. Curbstoning 4. Data entry 5. Data validation 6. Editing 7. One-way tabulation 8. Tabulation CHAPTER SUMMARY BY LEARNING OBJECTIVES Describe the process for data preparation and analysis. The value of marketing research is its ability to provide accurate decision-making information to the user. To accomplish this, the data must be converted into usable information or knowledge. After collecting data through the appropriate method, the task becomes one of ensuring the data provides meaning and value. Data preparation is the first part of the process of transforming data into useful knowledge. This process involves several steps: (1) data validation, (2) editing and coding, (3) data entry, (4) error detection, and (5) data tabulation. Data analysis follows data preparation and facilitates proper interpretation of the findings. Discuss validation, editing, and coding of survey data. Data validation attempts to determine whether surveys, interviews, or observations were conducted correctly and are free from fraud. In recontacting selected respondents, the researcher asks whether the interview (1) was falsified, (2) was conducted with a qualified respondent, (3) took place in the proper procedural setting, (4) was completed correctly and accurately, and (5) was accomplished in a courteous manner. The editing process involves scanning of interviews or questionnaire responses to determine whether the proper questions were asked, the answers were recorded according to the instructions given, and the screening questions were executed properly, as well as whether open-ended questions were recorded accurately. Once edited, the questionnaires are coded by assigning numerical values to all responses. Coding is the process of providing numeric labels to the data so they can be entered into a computer for subsequent statistical analysis. Explain data entry procedures and how to detect errors. There are several methods for entering coded data into a computer. First is the PC keyboard. Data also can be entered through terminals having touch-screen capabilities, or through the use of a handheld electronic pointer or light pen. Finally, data can be entered through a scanner using optical character recognition. Data entry errors can be detected through the use of error edit routines in the data entry software. Another approach is to visually scan the actual data after it has been entered. Describe data tabulation and analysis approaches. Two common forms of data tabulation are used in marketing research. A one-way tabulation indicates the number of respondents who gave each possible answer to each question on a questionnaire. Cross-tabulation provides categorization of respondents by treating two or more variables simultaneously. Categorization is based on the number of respondents who have responded to two or more consecutive questions. CHAPTER OUTLINE Opening VIGNETTE: Scanner Data Improves Understanding of Purchase Behavior The opening vignette in this chapter describes Walmart’s use of scanners. With the information supplied by the scanners, Walmart knows what is there, what is selling, and what needs replenishment. The same scanners can scan customer cards so the customer is associated with his or her purchase in a central database. This only requires a second or two per transaction and for the customer to produce the card at purchase time. Scanner technology is used by the marketing research industry. Questionnaire can be prepared and printed on a laser printer. Respondents can complete the questionnaire with any type of writing instrument and with the appropriate software and scanning device, the researcher can scan the completed questionnaires. The data are checked for errors, categorized, and stored within a matter of seconds. I. Value of Preparing Data for Analysis (PPT slide 10-3) Converting information from surveys or other data sources so it can be used in statistical analysis is referred to as data preparation. The data preparation process typically follows a four-step approach (PPT slide 10-3): Data validation Editing and coding Data entry Data tabulation Data preparation is essential in converting raw data into usable coded data for data analysis. It plays an important role in assessing and controlling data integrity and ensuring data quality by detecting potential response and nonresponse biases created by interviewer errors and/or respondent errors, as well as possible coding and data entry errors. Data preparation is important in dealing with inconsistent data from different sources or in converting data in multiple formats to a single format that can be analyzed with statistical software. With traditional data collection methods, the data preparation process starts after the interviews, questionnaires, or observation forms have been completed and returned to the field supervisor or researcher. But new technology associated with online surveys and data collection methods involving handheld terminals or scanners enables researchers to complete some data preparation tasks in real time and also eliminate data collection errors. In fact, technology advances are reducing and sometimes eliminating the need to manually code, verify, and enter data when creating electronic files. The stages of data preparation and analysis are shown in Exhibit 10.1 (PPT slides 10-4). Some data collection methods require activities in all stages while other methods involve only limited data preparation. II. Validation (PPT slide 10-5 and 10-6) The purpose of data validation is to determine if the surveys, interviews, and observations were conducted correctly and free of errors. When data collection involves trained interviewers obtaining data from respondents, the emphasis in validation most often is on interviewer errors, or failure to follow instructions. If data collection involves online surveys, validation often involves checking to see if instructions were correctly followed. In marketing research, interviewers submitting false data for surveys is referred to as curbstoning. As the name implies, curbstoning is when interviewers find an out-of-the-way location, such as a curbstone, and fill out the survey themselves rather than follow procedures with an actual respondent. Because of the potential for such falsification, data validation is an important step when the data acquisition process involves interviewers. To minimize fraudulent responses, marketing researchers target between 10 and 30 percent of completed interviews for “callbacks.” Specifically for telephone, mail, and personal interviews, a certain percentage of respondents from the completed interviews are recontacted by the research firm to make sure the interview was conducted correctly. Generally, the process of validation covers five areas (PPT slide 10-6): Fraud—was the person actually interviewed, or was the interview falsified? Did the interviewer contact the respondent simply to get a name and address, and then proceed to fabricate responses? Screening—Data collection often must be conducted only with qualified respondents. To ensure accuracy of the data collected, respondents are screened according to some preselected criteria, such as household income level, recent purchase of a specific product or brand, brand or service awareness, gender, or age. Procedure—In many marketing research projects data must be collected according to a specific procedure. For example, customer exit interviews typically must occur in a designated place as the respondent leaves a certain retail establishment. Completeness—In order to speed through the data collection process, an interviewer may ask the respondent only a few of the questions. To determine if the interview is valid, the researcher could recontact a sample of respondents and ask about questions from different parts of the questionnaire. Another problem could arise if the data collection process incorporates “skip” questions to direct interviewers (or respondents) to different parts of the questionnaire. With some data collection approaches the research supervisor can recontact respondents and verify their response to skip questions. Courtesy—Respondents should be treated with courtesy and respect during the interviewing process. To ensure a positive image, respondent callbacks are common to determine whether the interviewer was courteous. III. Editing and Coding (PPT slide 10-7 to 10-9) Following validation, the data must be edited for mistakes. Editing is the process where the raw data are checked for mistakes made by either the interviewer or the respondent. By reviewing completed interviews from primary research, the researcher can check several areas of concern (PPT slide 10-7): Asking the proper questions Accurate recording of answers Correct screening of respondents Complete and accurate recording of open-ended questions A. Asking the Proper Questions (PPT slide 10-7) One aspect of the editing process especially important to interviewing methods is to make certain the proper questions were asked of the respondent. As part of the editing process, the researcher will check to make sure all respondents were asked the proper questions. In cases where they were not, respondents are recontacted to obtain a response to omitted questions. This task is not necessary with online surveys if they were designed and set up correctly. B. Accurate Recording of Answers (PPT slide 10-7) Completed questionnaires sometimes have missing information. The interviewer may have accidentally skipped a question or not recorded it in the proper location. With a careful check of all questionnaires, these problems can be identified. In such cases, if it is possible, respondents are recontacted and the omitted responses recorded. This task is not necessary with online surveys if they were designed to prevent respondents from skipping questions. Sometimes, a respondent will accidentally not complete one or more questions for various reasons (carelessness, in a hurry to complete the survey, not understanding how to answer the question, etc.), resulting an incomplete response. C. Correct Screening Questions (PPT slide 10-7) The first two questions shown in Exhibit 10.2 are actually screening questions that determine whether the respondent is eligible to complete the survey. During the editing phase, the researcher makes certain only qualified respondents were included in a survey. It is also critical in the editing process to establish that the questions were asked and (for self-administered surveys) answered in the proper sequence. If the proper sequence is not followed in self-completion surveys, the respondent must be recontacted to verify the accuracy of the recorded data. Increasingly surveys are completed online. When online surveys are used respondents are automatically asked the screening questions and are not allowed to continue if the questions are not correctly answered. D. Responses to Open-Ended Questions (PPT slide 10-7) Responses to open-ended questions often provide very meaningful data. Open-ended questions may provide greater insight into the research questions than forced-choice questions. A major part of editing the answers to open-ended questions is interpretation. Exhibit 10.3 shows some typical responses to an open-ended question and thus points to problems associated with interpreting these questions. Coding is necessary in online surveys if they have open-ended questions. As with traditional data collection methods, the responses must be reviewed, themes and common words and patterns must be identified, and then codes must be assigned to facilitate quantitative data analysis. E. The Coding Process (PPT slide 10-8) Coding involves grouping and assigning values to various responses from the survey instrument (PPT slide 10-8). Typically, the codes are numerical—a number from 0 to 9—because numbers are quick and easy to input, and computers work better with numbers than alphanumerical values. Like editing, coding can be tedious if certain issues are not addressed prior to collecting the data. A well-planned and constructed questionnaire can reduce the amount of time spent on coding and increase the accuracy of the process if it is incorporated into the design of the questionnaire. Open-ended questions pose unique problems to the coding process. An exact list of potential responses cannot be prepared ahead of time for open-ended questions. Thus, a coding process must be prepared after data is collected. But the value of the information obtained from open-ended questions often outweighs the problems of coding the responses. Researchers typically use a four-step process to develop codes for responses (PPT slide 10-9): The procedure begins by generating a list of as many potential responses as possible. Responses are then assigned values within a range determined by the actual number of separate responses identified. Consolidation of responses is the second phase of the four-step process (Exhibit 10.4). Developing consolidated categories is a subjective decision that should be made only by an experienced research analyst with input from the project’s sponsor. The third step of the process is to assign a numerical value as a code. While at first this may appear to be a simple task, the structure of the questionnaire and the number of responses per question need to be considered. If correlation or regression is used in data analysis, then for categorical data there is another consideration. The researcher may wish to create “dummy” variables in which the coding is “0” and “1.” Assigning a coded value to missing data is very important. The best way to handle the coding of omitted responses is first to check on how your data analysis software treats missing data. The fourth step in the coding process is to assign a coded value to each response. This is probably the most tedious process because it is done manually. Each questionnaire is assigned a numerical value. The numerical value typically is a three-digit code if there are fewer than 1,000 questionnaires to code, and a four-digit code if there are 1,000 or more. IV. Data Entry (PPT slide 10-10) Data entry follows validation, editing, and coding. Data entry refers to the tasks involved with the direct input of the coded data into some specified software package that ultimately allows the research analyst to manipulate and transform the raw data into useful information (PPT slide 10-10). This step is not necessary when online data collection is used. There are several ways of entering coded data into an electronic file. With CATI and Internet surveys, the data are entered simultaneously with data collection and a separate step is not required. However, other types of data collection require the data to be entered manually, which typically is done using a personal computer. Scanning technology also can be used to enter data. This approach enables the computer to read alphabetic, numeric, and special character codes through a scanning device. Respondents use a number two pencil to fill in responses, which are then scanned directly into a computer file. Online surveys are becoming increasingly popular for completing marketing research studies. Indeed, online surveys now represent almost 60 percent of all data collection approaches. They not only are often faster to complete, but eliminate entirely the data entry process. A. Error Detection (PPT slide 10-11) Error detection identifies errors from data entry or other sources (PPT slide 10-11). The first step is to determine whether the software used for data entry and tabulation performs “error edit routines” that identify the wrong type of data. Another approach to error detection is to review a printed representation of the entered data (Exhibit 10.5; PPT slide 10-11). B. Missing Data (PPT slide 10-12 and 10-13) Missing data is often a problem in data analysis. Missing data is defined as a situation in which respondents do not provide an answer to a question. Sometimes respondents purposely do not answer a question creating a missing data situation. This most often arises when questions of a sensitive nature are asked, such asking for a respondent’s age or income. It also may occur because a respondent simply does not see a question, or is in a hurry to complete the survey and simply skips a question. Missing data is most often a problem with self-completion surveys. With online surveys respondents can be required to answer all questions but this may cause some respondents to simply stop answering questions and terminate the survey. In general, with online surveys it is recommended to require answers on all questions since the problem of respondents quitting a survey is not as substantial as is the problem of missing data. If missing data is encountered there are several ways to deal with it: One approach is to replace the missing value with a value from a similar respondent. Another approach, if there are other similar questions to the one with missing data, is to use the answers to the other similar questions as a guide in determining the replacement value. A third approach is to use the mean of a subsample of the respondents with similar characteristics that answered the question to determine a replacement value. A final alternative is to use the mean of the entire sample that answered the question as a replacement value, but this is not recommended because it reduces the overall variance in the question. C. Organizing Data Several useful data organizing functions are available in SPSS. One function under the Data pull-down menu is Sort Cases. This can be used to sort your data cases (observations) in either ascending or descending order. Another is the Split File function, which can be used for to split the respondents into two groups so they can be compared. A third useful function is the Select Cases option, which can be used to select only males, or only customers over age 35, and so on. This option can also be used to select a random subsample of your total sample. The specific steps to execute these and other functions are explained in the SPSS instructions available from the book’s website. V. Data Tabulation (PPT slide 10-14) Tabulation, sometimes referred to as a frequency account, is the simple process of counting the number of observations (cases) that are classified into certain categories. Two common forms of tabulation are used in marketing research projects: One-way tabulation—it categorizes single variables existing in a study (PPT slide 10-15). In most cases, a one-way tabulation shows the number of respondents (frequency count) who gave each possible answer to each question on the questionnaire. The number of one-way tabulations is determined by the number of variables measured in the study. Cross-tabulation—it simultaneously treats two or more variables in the study (PPT slide 10-14). It categorizes the number of responses to two or more questions, thus showing the relationship between those two variables. It is most often used with nominal or ordinal scaled data. One-way and cross-tabulation are considered descriptive statistics. A. One-Way Tabulation (PPT slide 10-15 to 10-18) One-way tabulations serve several purposes: They can be used to determine the amount of nonresponse to individual questions. Based on the coding scheme used for missing data, one-way tabulations identify the number of respondents who did not answer various questions on the questionnaire. They can be used to locate mistakes in data entry. Means, standard deviations, and related descriptive statistics often are determined from a one-way tabulation. One-way tabulations are also used to communicate the results of the research project. They can profile sample respondents, identify characteristics that distinguish between groups (e.g., heavy users versus light users), and show the percentage of respondents who respond differently to different situations. The most basic way to illustrate a one-way tabulation is to construct a one-way frequency table. A one-way frequency table shows the number of respondents who answered each possible response to a question given the available alternatives. An example of a one-way frequency table is shown in Exhibit 10.6 (PPT slide 10-16). In addition to listing the number of responses, one-way frequency tables also identify missing data and show valid percentages and summary statistics. B. Descriptive Statistics (PPT slide 10-19) Descriptive Statistics are used to summarize and describe the data obtained from a sample of respondents. Two types of measures are often used to describe data: Measures of central tendency Measures of dispersion Exhibit 10.8 provides an overview of the major types of descriptive statistics used by marketing researchers. C. Graphical Illustration of Data (PPT slide 10-20) The next logical step following development of frequency tables is to translate them into graphical illustrations. Graphical illustrations can be very powerful for communicating key research results generated from preliminary data analysis. MARKETING RESEARCH IN ACTION DELI DEPOT (PPT slide 10-21) The Marketing Research in Action in this chapter provides a questionnaire used by Deli Depot, a restaurant selling cold and hot sandwiches, soup and chili, yogurt, and pies and cookies. The exhibit provides the variables measured, the sample questions, and coding. Instructor Manual for Essentials of Marketing Research Joseph F. Hair, Mary Celsi, Robert P. Bush, David J. Ortinau 9780078028816, 9780078112119
Close