CHAPTER SIX SAMPLING: THEORY AND METHODS LEARNING OBJECTIVES (PPT slide 6-2) 1. Explain the role of sampling in the research process. 2. Distinguish between probability and nonprobability sampling. 3. Understand factors to consider when determining sample size. 4. Understand the steps in developing a sampling plan KEY TERMS AND CONCEPTS 1. Area sampling 2. Census 3. Central limit theorem (CLT) 4. Cluster sampling 5. Convenience sampling 6. Defined target population 7. Disproportionately stratified sampling 8. Judgment sampling 9. Nonprobability sampling 10. Nonsampling error 11. Population 12. Precision 13. Probability sampling 14. Proportionately stratified sampling 15. Quota sampling 16. Sampling 17. Sampling error 18. Sampling frame 19. Sampling plan 20. Sampling units 21. Simple random sampling 22. Snowball sampling 23. Stratified random sampling 24. Systematic random sampling CHAPTER SUMMARY BY LEARNING OBJECTIVES Explain the role of sampling in the research process. Sampling uses a portion of the population to make estimates about the entire population. The fundamentals of sampling are used in many of our everyday activities. For instance, we sample before selecting a TV program to watch, test-drive a car before deciding whether to purchase it, and take a bite of food to determine if our food is too hot or if it needs additional seasoning. The term target population is used to identify the complete group of elements (e.g., people or objects) that are identified for investigation. The researcher selects sampling units from the target population and uses the results obtained from the sample to make conclusions about the target population. The sample must be representative of the target population if it is to provide accurate estimates of population parameters. Sampling is frequently used in marketing research projects instead of a census because sampling can significantly reduce the amount of time and money required in data collection. Distinguish between probability and nonprobability sampling. In probability sampling, each sampling unit in the defined target population has a known probability of being selected for the sample. The actual probability of selection for each sampling unit may or may not be equal depending on the type of probability sampling design used. In nonprobability sampling, the probability of selection of each sampling unit is not known. The selection of sampling units is based on some type of intuitive judgment or knowledge of the researcher. Probability sampling enables the researcher to judge the reliability and validity of data collected by calculating the probability the findings based on the sample will differ from the defined target population. This observed difference can be partially attributed to the existence of sampling error. Each probability sampling method, simple random, systematic random, stratified, and cluster, has its own inherent advantages and disadvantages. In nonprobability sampling, the probability of selection of each sampling unit is not known. Therefore, potential sampling error cannot be accurately known either. Although there may be a temptation to generalize nonprobability sample results to the defined target population, for the most part the results are limited to the people who provided the data in the survey. Each nonprobability sampling method—convenience, judgment, quota, and snowball—has its own inherent advantages and disadvantages. Understand factors to consider when determining sample size. Researchers consider several factors when determining the appropriate sample size. The amount of time and money available often affect this decision. In general, the larger the sample, the greater the amount of resources required to collect data. Three factors that are of primary importance in the determination of sample size are (1) the variability of the population characteristics under consideration, (2) the level of confidence desired in the estimate, and (3) the degree of precision desired in estimating the population characteristic. The greater the variability of the characteristic under investigation, the higher the level of confidence required. Similarly, the more precise the required sample results, the larger the necessary sample size. Statistical formulas are used to determine the required sample size in probability sampling. Sample sizes for nonprobability sampling designs are determined using subjective methods such as industry standards, past studies, or the intuitive judgments of the researcher. The size of the defined target population does not affect the size of the required sample unless the population is large relative to the sample size. Understand the steps in developing a sampling plan. A sampling plan is the blueprint or framework needed to ensure that the data collected are representative of the defined target population. A good sampling plan will include, at least, the following steps: (1) define the target population, (2) select the data collection method, (3) identify the sampling frames needed, (4) select the appropriate sampling method, (5) determine necessary sample sizes and overall contact rates, (6) create an operating plan for selecting sampling units, and (7) execute the operational plan. CHAPTER OUTLINE Opening VIGNETTE: Mobile Web Interactions Explode The opening vignette in this chapter describes development of Internet searches by mobile phone. There has been a vast increase in the use of mobile phones for content online but consumers still prefer a desktop or laptop for searches. If a marketing research study were conducted on mobile phone search adoption, the key questions to answer would be: What respondents should be included in the study? How many should be included in each study? I. Value of Sampling in Marketing Research (PPT slide 6-3) Sampling is selection of a small number of elements from a larger defined target group of elements and expecting that the information gathered from the small group will allow judgments to be made about the larger group (PPT slide 6-3). A. Sampling as a Part of the Research Process (PPT slide 6-4) Sampling is often used when it is impossible or unreasonable to conduct a census. A census is a research study that includes data about every member of the defined target population (PPT slide 6-4). Sampling is less time-consuming and less costly than conducting a census. Samples also play an important indirect role in designing questionnaires. Depending on the research problem and the target population, sampling decisions influence the type of research design, the survey instrument, and the actual questionnaire. II. The Basics of Sampling Theory (PPT slide 6-5 to 6-8) A. Population (PPT slide 6-5) A population is an identifiable set of elements (e.g., people, products, organizations) of interest to the researcher and pertinent to the information problem. Most businesses that collect data are not really concerned with total populations, but with a prescribed segment. A defined target population is the complete set of elements identified for investigation based on the objectives of the research project. A precise definition of the target population is essential and is usually done in terms of elements, sampling units, and time frames. Sampling units are the target population elements actually available for selection during the sampling process. B. Sampling Frame (PPT slide 6-5) A sampling frame is a list of all eligible sampling units. Some common sources of sampling frames are lists of registered voters and customer lists from magazine publishers or credit card companies. There also are specialized commercial companies (e.g., Survey Sampling, Inc.; American Business Lists, Inc.; and Scientific Telephone Samples) that sell databases containing names, addresses, and telephone numbers of potential population elements. Regardless of the source, it is often difficult and expensive to obtain accurate, representative, and current sampling frames. C. Factors Underlying Sampling Theory (PPT slide 6-6) To understand sampling theory, researchers must know sampling-related concepts. Sampling concepts and approaches are often discussed as if the researcher already knows the key population parameters prior to conducting the research project. However, because most business environments are complex and rapidly changing, researchers often do not know these parameters prior to conducting research. One of the major goals of researching small, yet representative, samples of members of a defined target population is that the results of the research will help to predict or estimate what the true population parameters are within a certain degree of confidence. If business decision makers had complete knowledge about their defined target populations, they would have perfect information about the realities of those populations, thus eliminating the need to conduct primary research. The central limit theorem (CLT) describes the theoretical characteristics of a sample population. The CLT is the theoretical backbone of survey research and is important in understanding the concepts of sampling error, statistical significance, and sample sizes. In brief, the theorem states that for almost all defined target populations, the sampling distribution of the mean (x¯) or the percentage value (p¯) derived from a simple random sample will be approximately normally distributed, provided the sample size is sufficiently large (i.e., when n is > or = 30). Moreover, the mean (x¯) of the random sample with an estimated sampling error (Sx¯) fluctuates around the true population mean (µ) with a standard error of σ/√ n and an approximately normal sampling distribution, regardless of the shape of the probability frequency distribution of the overall target population. In other words, there is a high probability that the mean of any sample (x¯) taken from the target population will be a close approximation of the true target population mean (µ), as one increases the size of the sample (n). With an understanding of the basics of the central limit theorem, the researcher can do the following: Draw representative samples from any target population. Obtain sample statistics from a random sample that serve as accurate estimates of the target population’s parameters. Draw one random sample, instead of many, reducing the costs of data collection. More accurately assess the reliability and validity of constructs and scale measurements. Statistically analyze data and transform it into meaningful information about the target population. D. Tools Used to Assess the Quality of Samples (PPT slides 6-8) There are numerous opportunities to make mistakes that result in some type of bias in any research study. This bias can be classified as either (PPT slides 6-8): Sampling error Nonsampling error Random sampling errors could be detected by observing the difference between the sample results and the results of a census conducted using identical procedures. Two difficulties associated with detecting sampling error are: A census is very seldom conducted in survey research Sampling error can be determined only after the sample is drawn and data collection is completed Sampling error is any type of bias that is attributable to mistakes in either drawing a sample or determining the sample size. Moreover, random sampling error tends to occur because of chance variations in the selection of sampling units. Even if the sampling units are properly selected, those units still might not be a perfect representation of the defined target population, but they generally are reliable estimates. When there is a discrepancy between the statistic estimated from the sample and the actual value from the population, a sampling error has occurred. Sampling error can be reduced by increasing the size of the sample. In fact, doubling the size of the sample can reduce the sampling error, but increasing the sample size primarily to reduce the standard error may not be worth the cost. Nonsampling error is a bias that occurs in a research study regardless of whether a sample or census is used. These errors can occur at any stage of the research process. For example The target population may be inaccurately defined causing population frame error Inappropriate question/scale measurements can result in measurement error A questionnaire may be poorly designed causing response error There may be other errors in gathering and recording data or when raw data are coded and entered for analysis. In general, the more extensive a study, the greater the potential for nonsampling errors. Unlike sampling error, there are no statistical procedures to assess the impact of nonsampling errors on the quality of the data collected. Yet, most researchers realize that all forms of nonsampling errors reduce the overall quality of the data regardless of the data collection method. Nonsampling errors usually are related to the accuracy of the data, whereas sampling errors relate to the representativeness of the sample to the defined target population. IV. Probability and Nonprobability Sampling (PPT slides 6-9) There are two basic sampling designs: Probability sampling Nonprobability sampling Exhibit 6.2 lists the different types of both sampling methods. In probability sampling, each sampling unit in the defined target population has a known probability of being included for the sample. The actual probability of selection for each sampling unit may or may not be equal depending on the type of probability sampling design used. Specific rules for selecting members from the population for inclusion in the sample are determined at the beginning of a study to ensure: Unbiased selection of the sampling units Proper sample representation of the defined target population Probability sampling enables the researcher to judge the reliability and validity of data collected by calculating the probability that the sample findings are different from the defined target population. The observed difference can be partially attributed to the existence of sampling error. The results obtained by using probability sampling designs can be generalized to the target population within a specified margin of error. Nonprobability sampling is a sampling design in which the probability of selecting each sampling unit is not known. The selection of sampling units is based on the judgment of the researcher and may or may not be representative of the target population. The degree to which the sample is representative of the defined target population depends on the sampling approach and how well the researcher executes the selection activities. A. Probability sampling designs (PPT slide 6-10 to 6-14) Simple random sampling is a probability sampling in which every sampling unit has a known and equal chance of being selected (PPT slide 6-11). Simple random sampling has several advantages: The technique is easily understood and the survey’s results can be generalized to the defined target population with a prespecified margin of error. Simple random samples produce unbiased estimates of the population’s characteristics. This method guarantees that every sampling unit has a known and equal chance of being selected, no matter the actual size of the sample, resulting in a valid representation of the defined target population. The primary disadvantage of simple random sampling is the difficulty to obtaining a complete and accurate listing of the target population elements. Simple random sampling requires that all sampling units be identified. For this reason, it works best for small populations where accurate lists are available. Systematic random sampling is similar to simple random sampling but requires that the defined target population be ordered in some way, usually in the form of a customer list, taxpayer roll, or membership roster (PPT slides 6-11). In research practices, systematic random sampling has become a popular method of drawing samples. Compared to simple random sampling, systematic random sampling is less costly because it can be done relatively quickly. When executed properly, systematic random sampling creates a sample of objects or prospective respondents that is very similar in quality to a sample drawn using simple random sampling. To use systematic random sampling, the researcher must be able to secure a complete listing of the potential sampling units that make up the defined target population. But unlike simple random sampling there is no need to give the sampling units any special code prior to drawing the sample. Instead, sampling units are selected according to their position using a skip interval. The skip interval is determined by dividing the number of potential sampling units in the defined target population by the number of units desired in the sample. The required skip interval is calculated using the following formula: Skip interval = Defined target population list size ∕ Desired sample size Exhibit 6.3 displays the steps that a researcher would take in drawing a systematic random sample (PPT slide 6-12). Systematic sampling is frequently used because it is a relatively easy way to draw a sample while ensuring randomness. The availability of lists and the shorter time required to draw a sample versus simple random sampling makes systematic sampling an attractive, economical method for researchers. The greatest weakness of systematic random sampling is the possibility of hidden patterns in the list if names that create bias. Another difficulty is the number of sampling units in the target population must be known. When the size of the target population is large or unknown, identifying the number of units is difficult, and estimates may not be accurate. Stratified random sampling involves the separation of the target population into different groups, called strata, and samples are selected from each stratum. (PPT slides 6-13) It is similar to segmentation of the defined target population into smaller, more homogeneous sets of elements. To ensure that the sample maintains the required precision, representative samples must be drawn from each of the smaller population groups (stratum). Drawing a stratified random sample involves three basic steps: Dividing the target population into homogeneous subgroups or strata Drawing random samples from each stratum Combining the samples from each stratum into a single sample of the target population Two common methods are used to derive samples from strata: Proportionate—proportionately stratified sampling is a stratified sampling method in which each stratum is dependent on its size relative to the defined target population. Disproportionate—disproportionately stratified sampling is a stratified sampling method in which the size of each stratum is independent of its relative size in the population. Dividing the target population into homogeneous strata has several advantages, including the: assurance of representativeness in the sample opportunity to study each stratum and make comparisons between strata ability to make estimates for the target population with the expectation of greater precision or less error The primary difficulty encountered with stratified sampling is determining the basis for stratifying. Cluster sampling is similar to stratified random sampling, but is different in that the sampling units are divided into mutually exclusive and collectively exhaustive subpopulation, called clusters. Each cluster is assumed to be representative of the heterogeneity of the target population (PPT slide 6-14). Once the cluster has been identified, the prospective sampling units are selected for the sample by either using a simple random sampling method or canvassing all the elements (a census) within the defined cluster. A popular form of cluster sampling is area sampling (PPT slide 6-14). In area sampling, the clusters are formed by geographic designations. When using area sampling, the researcher has two additional options: The one-step approach—the researcher must have enough prior information about the various geographic clusters to believe that all geographic clusters are basically identical with regard to the specific factors that were used to initially identify the clusters. The two-step approach Cluster sampling is widely used because of its cost-effectiveness and ease of implementation. In many cases, the only representative sampling frame available to researchers is one based on clusters. A primary disadvantage of cluster sampling is that the clusters often are homogeneous. The more homogeneous the cluster, the less precise the sample estimates. Another concern with cluster sampling is the appropriateness of the designated cluster factor used to identify the sampling units within clusters. C. Nonprobability Sampling Designs (PPT slide 6-15 and 6-16) Convenience sampling is a nonprobability sampling method in which samples are drawn at the convenience of the researcher (PPT slide 6-15). The assumption is that the individuals interviewed at the shopping mall are similar to the overall defined target population with regard to the characteristic being studied. Convenience sampling enables a large number of respondents to be interviewed in a relatively short time and ease of use. For this reason, it is commonly used in the early stages of research, including construct and scale measurement development as well as pretesting of questionnaires. But using convenience samples to develop constructs and scales can be risky. Another major disadvantage of convenience samples is that the data are not generalizable to the defined target population. The representativeness of the sample cannot be measured because sampling error estimates cannot be calculated. Judgment Sampling or purposive sampling is a nonprobability sampling method in which participants are selected according to an experienced individual’s belief that they will meet the requirements of the research study (PPT slide 6-15). If the judgment of the researcher is correct, the sample generated by judgment sampling will be better than one generated by convenience sampling. As with all nonprobability sampling procedures, however, the representativeness of the sample cannot be measured. Quota sampling involves the selection of prospective participants according to prespecified quotas for either demographic characteristics, specific attitudes, or specific behaviors (PPT slide 6-16). The purpose of quota sampling is to assure that prespecified subgroups of the target population are represented. The major advantage of quota sampling is that the sample generated contains specific subgroups in the proportions desired by researchers. Quota sampling reduces selection bias by field workers. An inherent limitation of quota sampling is that the success of the study is dependent on subjective decisions made by researchers. Since it is a nonprobability sampling method, the representativeness of the sample cannot be measured. Therefore, generalizing the results beyond the sampled respondents is questionable. Snowball sampling is a nonprobability sampling method, also called referral sampling, in which a set of respondents is chosen, and they help the researcher identify additional people to be included in the study (PPT slide 6-16). Snowball sampling typically is used in situations where: the defined target population is small and unique. compiling a complete list of sampling units is very difficult. Snowball sampling is a reasonable method of identifying respondents who are members of small, hard-to-reach, uniquely defined target populations. As a nonprobability sampling method, it is most useful in qualitative research. But snowball sampling allows bias to enter the study. If there are significant differences between people who are known in certain social circles and those who are not, there may be problems with this sampling technique. Like all other nonprobability sampling approaches, the ability to generalize the results to members of the target population is limited. D. Determining the Appropriate Sampling Design (PPT slide 6-17) Determining the best sampling design involves consideration of several factors. Exhibit 6.4 provides an overview of the major factors that should be considered (PPT slide 6-17): Research objectives Degree of accuracy Resources Time frame Knowledge of the target population Scope of the research Statistical analysis needs V. Determining Sample Sizes (PPT slides 6-18 to 6-24) Determining the sample size is not an easy task. The researcher must consider how precise the estimates must be and how much time and money are available to collect the required data, since data collection is generally one of the most expensive components of a study. Sample size determination differs between probability and nonprobability designs. A. Probability Sample Sizes (PPT slides 6-18 to 6-20) Three factors play an important role in determining sample sizes with probability designs (PPT slides 6-18): The population variance, which is a measure of the dispersion of the population, and its square root, referred to as the population standard deviation. The greater the variability in the data being estimated the larger the sample size needed. The level of confidence desired in the estimate. The higher the level of confidence desired the larger the sample size needed. The degree of precision desired in estimating the population characteristic. Precision is the acceptable amount of error in the sample estimate. The more precise the required sample results, that is, the smaller the desired error, the larger the sample size. For a particular sample size, there is a trade-off between degree of confidence and degree of precision, and the desire for confidence and precision must be balanced. These two considerations must be agreed upon by the client and the marketing researcher based on the research situation. Formulas based on statistical theory can be used to compute the sample size. For pragmatic reasons, such as budget and time constraints, alternative “ad hoc” methods often are used. Examples of these are sample sizes based on rules of thumb, previous similar studies, one’s own experience, or simply what is affordable. Irrespective of how the sample size is determined it is essential that it should be of a sufficient size and quality to yield results that are seen to be credible in terms of their accuracy and consistency. When formulas are used to determine sample size, there are separate approaches for determining sample size based on a predicted population mean and a population proportion. The formulas are used to estimate the sample size for a simple random sample. When the situation involves estimating a population mean, the formula for calculating the sample size is: n = (Z2 B,CL) (σ2∕ e2) Where: ZB,CL = The standardized z-value associated with the level of confidence σµ = Estimate of the population standard deviation (σ) based on some type of prior information e = Acceptable tolerance level of error (stated in percentage points) In situations where estimates of a population proportion are of concern, the standardized formula for calculating the needed sample size would be: n = (Z2 B,CL) ([P×Q] ∕ e2) Where: ZB,CL = The standardized z-value associated with the level of confidence P = Estimate of expected population proportion having a desired characteristic based on intuition or prior information Q = [1 P], or the estimate of expected population proportion not holding the characteristic of interest e = Acceptable tolerance level of error (stated in percentage points) When the defined target population size in a consumer study is 500 elements or less, the researcher should consider taking a census of the population rather than a sample. The logic behind this is based on the theoretical notion that at least 384 sampling units need to be included in most studies to have a +/- 5 percent confidence level and a sampling error of +/- 5 percentage points. Sample sizes in business-to-business studies present a different problem than in consumer studies where the population almost always is very large. With business-to-business studies the population frequently is only 200 to 300 individuals. An acceptable sample size may be as small as 30 percent or so but the final decision would be made after examining the profile of the respondents. B. Sampling from a Small Population (PPT slides 6-21 and 6-22) When working with small populations, however, use of the above formulas may lead to an unnecessarily large sample size. If, for example, the sample size is larger than 5 percent of the population then the calculated sample size should be multiplied by the following correction factor: N∕(N + n 1) Where: N = Population size n = Calculated sample size determined by the original formula Thus, the adjusted sample size is: Sample size = (Specified degree of confidence × Variability/Desired precision)2 × N∕(N + n1) C. Nonprobability Sample Sizes (PPT slides 6-23) Sample size formulas cannot be used for nonprobability samples. Determining the sample size for nonprobability samples is usually a subjective, intuitive judgment made by the researcher based on either past studies, industry standards, or the amount of resources available. Regardless of the method, the sampling results cannot be used to make statistical inferences about the true population parameters. Researchers can compare specific characteristics of the sample, such as age, income, and education, and note that the sample is similar to the population. D. Other Sample Size Determination Approaches (PPT slides 6-24) Sample sizes are often determined using less formal approaches. For example, the budget is almost always a consideration, and the sample size then will be determined by what the client can afford. A related approach is basing sample size on similar previous studies that are considered comparable and judged as having produced reliable and valid findings. Consideration also is often given to the number of subgroups that will be examined and the minimum sample size per subgroup needed to draw conclusions about each subgroup. Some researchers suggest the minimum subgroup sample size should be 100 while many believe subgroup sample sizes as small as 50 are sufficient. If the minimum subgroup sample size is 50 and there are five subgroups, then the total sample size would be 250. Finally, sometimes the sample size is determined by the number of questions on a questionnaire. Decisions on which of these approaches, or combinations or approaches, to use require the judgment of both research experts and managers to select the best alternative. VI. Steps in Developing a Sampling Plan (PPT slide 6-25 and 6-26) Sampling plan is the blueprint or framework needed to ensure that the data collected are representative of the defined target population (PPT slide 6-25). A good sampling plan includes the following steps (PPT slide 6-26): Define the target population Select the data collection method Identify the sampling frames needed Select the appropriate sampling method—in determining the sampling method, the researcher must consider seven factors: Research objectives Desired accuracy Availability of resources Time frame Knowledge of the target population Scope of the research Statistical analysis needs Determine necessary sample sizes and overall contact rates—to determine the appropriate sample size, decisions have to be made concerning the: variability of the population characteristic under investigation. level of confidence desired in the estimates. precision required. Create an operating plan for selecting sampling units Execute the operational plan MARKETING RESEARCH IN ACTION DEVELOPING A NEW SAMPLING PLAN FOR A NEW MENU INITIATIVE SURVEY (PPT slide 6-27 and 6-28) The Marketing Research in Action introduces the fact that the owners of the Santa Fe Grill realize that in order to remain competitive in the restaurant industry, new menu items need to be introduced periodically to provide variety for current customers and to attract new customers. Owners of the Santa Fe Grill want to know whether the menu should be changed to include items beyond the traditional southwestern cuisine, how many new items should be included on the survey, and what type of sampling plan should be developed for selecting respondents, and who should those respondents be. Instructor Manual for Essentials of Marketing Research Joseph F. Hair, Mary Celsi, Robert P. Bush, David J. Ortinau 9780078028816, 9780078112119
Close