## Preview Extract

Chapter 8: Data Analysis and Statistical Methods: Univariate and Bivariate Analyses

TRUE/FALSE

1) Descriptive statistics are used to make inferences about a population.

Answer: False

Inferential statistics are used to make inferences about a population; descriptive statistics

describe a sample.

2) Both descriptive and inferential techniques must be chosen to match the scale level

(nominal, ordinal, or interval) inherent in the variable(s) being analyzed.

Answer: True

Data analysis techniques must match the scale level of all of the variables being analyzed.

3) Three measures of dispersion are often used in marketing research: the mean, the median,

and the mode.

Answer: False

These are measures of central tendency, not of dispersion.

4) The standard deviation is appropriate as a measure of dispersion for ordinal data.

Answer: False

The standard deviation is only appropriate for data that is at least interval.

5) The mode is defined as the middle value when data are arranged in order of magnitude.

Answer: False

It is the median that is the middle value of an array of numbers arranged in order of

magnitude. The mode is the category or value that occurs the most often in the data.

6) A statistic that is resistant to large changes when the data changes slightly is said to be

bimodal.

Answer: False

A statistic that is resistant to large changes in response to small changes in the data set is

called robust.

7) The median is robust, but the mean is not.

Answer: True

The median is robust because it is resistant to any large change in the data.

8) Sometimes variables may be bimodal, which means that two categories have similar, and

relatively high, frequencies.

Answer: True

A bimodal variable is one with two modes, or two categories with relatively high frequencies.

9) Normal distributions are bimodal.

Answer: False

Normal distributions are unimodal.

10) Relative and absolute frequencies are appropriate measures of dispersion for nominal

data.

Answer: True

Absolute frequencies are the number of items in a sample in each category of a nominal

variable; relative frequencies are the proportions.

11) In hypothesis testing, researchers never “accept” a null hypothesis, merely “fail to reject.”

Answer: True

In hypothesis testing, a null hypothesis cannot be accepted. All the test does is provide

statistical evidence such that the null hypothesis cannot be rejected.

12) A “two-tailed test” is used to test the alternative hypothesis that a sample statistic is less

than a particular value.

Answer: False

This is a one-tailed test since you are testing in only one direction.

13) A Type II error has occurred if the null hypothesis is true and it is rejected.

Answer: False

Type II error occurs when the null hypothesis is false and is not rejected.

14) The usual starting point in hypothesis testing is to specify the level of Type II error the

researcher is willing to tolerate.

Answer: False

The level of Type I error is specified by the researcher.

15) Because hypotheses are always tested with data from a sample, there will always be some

sampling error.

Answer: True

Sampling error will always occur in the process of sampling a population.

16) In hypothesis testing, it is very difficult to make both α and β small at the same time.

Answer: True

The smaller a is set, the larger the probability of a Type II error (β).

17) The z-test is an appropriate inferential statistical method for ordinal data if the sample

size is large and the population standard deviation is not known.

Answer: False

The condition is true, but the data must be interval, not ordinal.

18) When the sample size is small ( 30).

19) For interval data, the chi-square test is the most appropriate inferential statistical test.

Answer: False

For interval data, it is the z-test or the t-test. The chi-square test is for nominal data.

20) The chi-square test can be used to test a hypothesized population distribution across

nominal categories.

Answer: True

The chi-square test is a procedure for comparing a hypothesized population distribution

across nominal categories against an observed distribution.

21) The linear correlation coefficient is scale-dependent, because the scale of the two

variables will impact the correction coefficient. Changing one or both scales will change the

coefficient.

Answer: True

The linear correlation coefficient is sample- and scale-dependent; the sample size and the

scale of the variables will affect the coefficient.

22) The goal of regression analysis is to predict a value for the independent variable based on

knowledge of the dependent variable.

Answer: False

It is just the reverse, predicting a dependent variable based on knowledge of an independent

variable.

23) When the total variation of a regression equation has been decomposed into that

attributable to the regression, the remainder is attributable to error.

Answer: True

Total variation equals variation due to the regression plus unexplained variation, or error.

24) If the coefficient of determination (r2) for a multiple regression is large, then the

regression is highly significant.

Answer: False

If additional independent variables are added into a regression, r2 will always get larger, even

if the added variables are meaningless. For this reason, r 2-adjusted is used instead.

25) The F-test measures the proportion of variation in the dependent variable explained by

the entire model, including all the independent variables taken together.

Answer: True

The F-test indicates whether the “entire model”—that is, all independent variables taken

together—explain a significant proportion of the variation in the dependent variable.

MULTIPLE CHOICE

1) Inferential statistics use probability theory to

a. make statements about the population

b. make statements about a sample

c. describe a sample

d. describe the overall distribution of samples

Answer: A

Inferential statistics applies probability theory to makes inferences about a population from a

sample.

2) A first step in almost all marketing research projects is to get a feel for the data by

performing a

a. multivariate analysis on most of the available variables

b. bivariate analysis on the independent and dependent variable

c. linear regression on most of the available variables

d. univariate analysis on most of the available variables

Answer: D

To get a good feel for the data, is important to conduct a univariate analysis on the available

variables within the data.

3) The measures of central tendency include all of the following except

a. mean

b. standard deviation

c. mode

d. median

Answer: B

Standard deviation is a measure of dispersion.

4) The _______________ is an appropriate measure of central tendency for interval data and

is by far the most widely used measure in all of statistics.

a. mean

b. standard deviation

c. mode

d. median

Answer: A

The mean is the most widely used variable and is an appropriate measure of central tendency

for interval data.

5) The median is not applicable for

a. nominal variables

b. ordinal variables

c. interval variables

d. a and c

Answer: A

Because nominal variables are just classification or categorical data, a median is meaningless.

The median can be used for ordinal data or higher (interval and ratio).

6) The _______________ is an especially important and useful measure when data contain

outliers.

a. mean

b. standard deviation

c. mode

d. median

Answer: D

The median is important when data contains outliers, which are values well outside of the

range of most of the data.

7) The _______________ is a measure of central tendency appropriate for nominal data.

a. chi-square test

b. variance

c. mode

d. median

Answer: C

The mode is the appropriate measure of central tendency for nominal data because the mode

is the category that occurs the most often. The chi-square test is an inferential statistical test

for nominal data.

8) The appropriate measure of dispersion for interval data is

a. the standard deviation

b. absolute frequencies

c. relative frequencies

d. both b and c

Answer: A

The standard deviation is the appropriate measure of dispersion for data that is at least

interval.

9) The appropriate measure of dispersion for nominal data is

a. the standard deviation

b. absolute frequencies

c. relative frequencies

d. both b and c

Answer: D

Because nominal data is categorical data, the standard deviation is meaningless. Absolute and

relative frequencies are the appropriate measures of dispersion.

10) In hypothesis testing, the null hypothesis is stated as a population parameter being

a. not equal to a particular value

b. equal to a particular value

c. being greater than a particular value

d. being less than a particular value

e. either a, c, or d

Answer: B

The null hypothesis, written as H0, always states the a population parameter is equal to a

certain value.

11) The alternative hypothesis should be stated as a parameter being

a. not equal to the value in H0

b. equal to the value in H0

c. greater than the value in H0

d. less than the value in H0

e. either a, c, or d

Answer: E

The alternative hypothesis can be stated in any of the three ways: not equal to, less than, or

greater than the value in the null hypothesis.

12) Type I error occurs when

a. H0 is true and is rejected

b. H0 is true and is not rejected

c. H0 if false and is rejected

d. H0 is false and is not rejected

Answer: A

Type I error occurs when the null hypothesis is true, but researchers incorrectly conclude that

they should reject it.

13) In hypothesis testing, the Greek letter beta (β) refers to

a. the null hypothesis

b. the alternative hypothesis

c. Type I error

d. Type II error

Answer: D

For Type I error the Greek alpha letter is used; for Type II error the Greek letter beta is used.

14) A researcher will set the maximum tolerable degree of Type I error, which is referred to as

the

a. confidence level of the test

b. significance level

c. power of the test

d. standard deviation

Answer: B

The significance level is designated by a and is the maximum level for Type I error that will

be acceptable.

15) If the significance level for a hypothesis test is set at 0.03, the confidence level of the test

would be

a. 0.03

b. 0.09

c. 0.97

d. determined by the researcher

Answer: C

The confidence level of the test is 1 - the significance level (α).

16) The smaller a researcher sets the significance level, the

a. larger the probability of Type I error

b. larger the probability of Type II error

c. smaller the probability of Type II error

d. smaller the confidence level

Answer: B

As the significance level is decreased, the probability of making a Type II error increases.

17) The power of a hypothesis test is described as the probability of

a. rejecting a true null hypothesis

b. rejecting a false null hypothesis

c. not rejecting a true null hypothesis

d. not rejecting a false null hypothesis

Answer: B

If the null hypothesis is false, and it is rejected, then the correct decision is made. The

probability of making this correct decision is called the power of the test.

18) The z-test is appropriate for interval data when the sample

a. is of any size and the population standard deviation is known

b. is small and the population standard deviation is known

c. is of any size and the sample standard deviation is known

d. is large and the sample standard deviation is known

Answer: A

If the population standard deviation is known, then it does not matter what the sample size is.

19) Once a sample statistic is obtained and compared to the value specified by the null

hypothesis, the null hypothesis can be rejected if the difference

a. is greater than zero

b. is greater than that due to sampling error

c. between the standard deviations is above 1.96

d. is greater than what is stated in the alternative hypothesis

Answer: B

To reject the null hypothesis, the difference between the sample statistic and the value stated

in the null hypothesis has to be greater than the size of the sampling error.

NOTE: For questions #20 through 24 relate back to this situation:

A product manager is concerned with whether or not her product’s share is equal to 25

percent of the market. A sample of 35 data points yields a sample proportion equal to 38

percent.

20) What is the proper hypothesis specification, given the situation outlined above?

a. H0: π = 0.38; H1: π ≠ 0.38

b. H0: π ≥ 0.38; H1: π 0.25

d. H0: π = 0.25; H1: π ≠ 0.25

e. More information is needed to determine the hypotheses.

Answer: D

The null hypothesis (H0) is the statement we wish to test, in this case that the product’s share

equals 25 percent. This is a two-tailed test.

21) What inferential statistical test is appropriate for the hypothesis in question #20?

a. chi-square

b. F-test

c. t-test

d. z-test

e. none of the above

Answer: D

Market share is ratio data, and the sample size is over 30, so the z-test is appropriate.

22) What would be the numerator of the inferential statistical test for the hypothesis in

question #20?

a. 0.13

b. -0.13

c. (0.13)2

d. 1 – (0.13)2

e. none of the above

Answer: A

The numerator of the z-test is x̄ – m. In this case, 0.38 – 0.25 = 0.13

23) Given the following abbreviated table of critical values,

for the hypothesis in question #20, what is the appropriate critical value if one wants a 95

percent confidence level?

a. 1.64

b. 1.96

c. 2.24 f. none of the above

d. 2.33

e. 2.47

Answer: B

The hypothesis requires a two-tailed test. For a desired confidence level (1– α) of 95 percent,

the critical value is 1.96.

24) Given the problem, hypothesis, and test procedure stated in questions #20 through 23, a

calculated test value of 1.78 is obtained. What conclusion is possible, assuming a 95 percent

confidence level?

a. H0 is not rejected.

b. H0 is rejected.

c. p in reality equals neither 0.25 nor 0.38.

d. No statement is possible; more information required.

Answer: A

The cutoff for the two-tailed test is 1.96, and the calculated value is 1.78. Therefore, the

hypothesis cannot be rejected.

25) In general, the degrees of freedom (df) is the size of the sample (n)

a. plus the number of parameters estimated in the sample

b. minus the number of parameters estimated in the sample

c. minus (x̄ – Xi )2

d. divided by the number of parameters estimated in the sample

Answer: B

For the testing of means, df = n – 1; the general form is that df is equal to the sample size

minus the number of parameters estimated in the sample.

26) The _______________ is a procedure for comparing a hypothesized population

distribution across nominal categories against an observed distribution.

a. t-test

b. z-test

c. chi-square test

d. linear regression

Answer: C

For nominal data, the chi-square test is used to determine any difference between the sample

population and the hypothesized population distribution.

27) In using a univariate chi-square test of distributions for data that has 5 categories the

degrees of freedom that would be used to determine the chi-square test statistic would be

a. 3

b. 4

c. 5

d. 6

Answer: B

For a univariate chi-square test of k categories, df = k – 1.

28) Bivariate analysis would be needed to answer all of the following questions except:

a. Is our sample equally distributed among income categories?

b. What is the relationship between the heavy users of our brand and media viewing habits?

c. Are higher levels of sales associated with sales training attendance?

d. Do gas prices help predict sales of recreational boats?

Answer: A

Examining the income of the sample is a univariate analysis because only one variable is

examined.

29) The most important descriptive statistical procedure appropriate for use with two interval

variables is the

a. z-test

b. t-test

c. simple linear regression

d. chi-square test

Answer: C

Simple linear regression is the most important descriptive statistical procedure when

comparing two interval variables within a sample data set. The chi-square, z- and t-tests are

inferential statistical tests.

30) The linear correlation coefficient measures the

a. linear relationship between two variables, X and Y

b. causal relationship between two variables, X and Y

c. degree of curvature (from elliptical to a straight line) between two variables, X and Y

d. relationship between two nominal variables, X and Y

Answer: A

The linear correlation coefficient does not measure causation, but simply the linear

relationship between two variables.

31) The effects of sample size can be corrected for by taking the SS XY value for variables X

and Y and _______________ to calculate the “covariance.”

a. subtracting the mean of the variable

b. dividing it by the degrees of freedom of the sample

c. dividing it by the standard deviations of X and Y

d. using a chi-square test

Answer: B

it is divided by the degrees of freedom to calculate the

covariance.

32) The effect of units is eliminated by dividing the covariance by the standard deviation of

the two variables to calculate a measure called the

a. scale independent relationship

b. linear correlation coefficient

c. coefficient of determination

d. least sum of squares

Answer: B

The linear correlation coefficient (rXY) is the covariance divided by the standard deviation of

the two variables.

33) In examining the relationship between two variables, if r = -1, then the relationship is

a. a perfect positive correlation

b. a perfect negative correlation

c. not a relationship at all

d. sorely in need of counseling

Answer: B

An r of -1 would indicate a perfect negative correlation.

34) The exact percentage of variation shared by two variables is calculated by squaring r,

which is called the

a. scale-independent relationship

b. linear correlation coefficient

c. coefficient of determination

d. least sum of squares

Answer: C

The coefficient of determination is r squared and indicates the percentage of variation shared

by two variables.

35) Simple regression is appropriate for determining how

a. multiple dependent variables are related to one independent variable

b. multiple independent variables are related to one dependent variable

c. one independent variable is related to one dependent variable

d. one low-coefficient variable is related to one high-coefficient variable

Answer: C

Regression involves understanding how one or more independent variables are related to a

dependent variable. Simple regression involves one independent variable and one dependent

variable.

36) The total variation, or sum-of-squares, of the dependent variable in a regression equation

can be partitioned into

a. variation explained by the independent variable and variation explained by the dependent

variable

b. variation explained by X and the variation explained by Y

c. variations explained by each of the independent variables

d. variation explained by regression and variation unexplained by regression

Answer: D

Total sum-of-squares, or variation, equals explained variation plus unexplained variation

(error).

37) In a regression equation, the slope indicates the

a. increase in the dependent variable if the independent variable changes by one unit

b. increase in the independent variable if the dependent variable changes by one unit

c. point the regression line crosses the Y-axis

d. point the regression line crosses the X-axis

Answer: A

By definition, the slope indicates what happens to the dependent variable when the

independent variable changes by one unit.

38) If additional independent (X) variables are added into the regression equation, r 2 will

always get larger. To correct for this situation, statisticians have developed the

_______________, which is not so easily influenced by additional variables.

a. least sum-of-squares

b. r2-adjusted

c. standard error

d. coefficient of determination

Answer: B

r2-adjusted was developed to prevent r2 from automatically increasing by just adding

additional X variables.

39) To determine if there is a relationship between two nominal variables, researchers would

use

a. simple regression

b. cross-tabulation and chi-square test on the two variables being independent

c. cross-tabulation and relative frequencies of independent variables

d. bivariate regression

Answer: B

For two nominal variables, a cross-tabulation would be used and then a chi-square test on the

null hypothesis that the two variables are independent of each other.

40) To use the chi-square test with the cross-tabulation of nominal data, expected cell sizes

should be _______________ or greater.

a. 5

b. 10

c. 18

d. 30

Answer: A

Cell sizes should be 5 or greater. If not, then categories need to be combined.

SHORT ANSWER

1) Describe the difference between descriptive and inferential statistics.

Answer: Descriptive statistics provides summary measures of the data in samples. Inferential

statistics uses probability theory to make statements about the population based on the results

stemming from samples.

2) Based on the following array of data, calculate the mean, median, mode, and the absolute

frequency of the value “10.”

Data: 14, 23, 10, 12, 11, 5, 7, 16, 10

Answer:

Mean = 12

Median = 11

Mode = 10

Absolute frequency of 10 = 3

3) What is the power of the test and how does it relate to error?

Answer: The power of the test is the probability of rejecting a false null hypothesis. It is equal

to 1 – β, where β denotes the probability of Type II error (failing to reject a false null

hypothesis).

4) Given the null hypothesis H0 : µ = 10, what alternative hypothesis would require a twotailed test?

Answer: µ ≠ 10

5) If the relationship between two variables has a linear correlation coefficient (r) = 0.30,

what is the coefficient of determination?

Answer: r2, the coefficient of determination, is 0.09.

calculation: (0.3 × 0.3) = 0.9

ESSAY

1) Identify the three types of measures of central tendency. Discuss when each is appropriate

to use.

Answer: The three measures of central tendency are the mode, median, and mean. For

nominal data the mode is the appropriate measure; for ordinal data the median is the

appropriate measure; for interval data, the mean is the appropriate measure.

2) Identify and describe the types of error, their associated terminology and symbology, and

how they are generally encountered in hypothesis testing.

Answer: Type I error (α) is generally specified at the outset as a desired significance level

(also = α) or confidence level (= 1 – α). It signifies the probability of rejecting a true null

hypothesis that is considered acceptable. The smaller a is set, the larger the probability of a

Type II error (β), which is the probability of failing to reject a false null hypothesis. The

probability of rejecting a false null hypothesis (1 – β) is called the power of the test.

3) What are a z-test and a t-test, and when should each be used?

Answer: The z-test and t-test are both inferential statistical tests for interval data. The z-test

can be used when the sample size is large (30 or more) and for any sample size when the

population standard deviation (s) is known. The t-test must be used for small sample sizes (<

30) if the population standard deviation is not known.

4) Define, and describe the relationship between, SSXY, covariance, the linear correlation

coefficient, the coefficient of determination, and r2-adjusted.

Answer: SSXY is a distance measure between the X and Y coordinates;

To eliminate the effect of sample size, SSXY is divided by the

degrees of freedom to calculate the covariance. To make the measure scale-independent, the

covariance is divided by the standard deviation of the two variables to calculate the linear

correlation coefficient (r). The coefficient of determination (r2), the square of the linear

correlation coefficient, is the percentage of variation shared by two variables. r2-adjusted

prevents the r2 value from increasing when statistically meaningless variables are added to

the regression.

5) Outline the steps involved in hypothesis testing.

Answer: Hypothesis testing involves the following 6 steps:

1) Formulate a null and an alternative hypothesis

2) Select the appropriate statistical test given the type of data being analyzed.

3) Specify the significance level.

4) Look up the test statistic for the given significance level.

5) Perform the statistical test chosen in Step 2.

6) Compare the value of the statistic calculated in Step 5 with the value obtained in Step 4. If

the computed test statistic is greater than the tabulated value, then the null hypothesis should

be rejected.

Test Bank for Modern Marketing Research: Concepts, Methods, and Cases

Fred M. Feinberg, Thomas Kinnear, James R. Taylor

9781133188964, 9781133191025, 9780759391710