## Preview Extract

Chapter 9: Multiple Regression: Modeling Multivariate Relationships

TRUE/FALSE

1) Regression is the dominant method of data analysis throughout the natural and social

sciences.

Answer: True

Regression is the most commonly used method of data analysis.

2) All regression models, including simple linear, binary, ordinal, multinomial logit, rankordered, and count, can be viewed as special cases of the general formulation called the

General Linear Model.

Answer: True

One can treat all types of regression as if they were really one grand method, the General

Linear Model. Some advanced statistical programs require that all regression models be

viewed as special cases of this general formulation.

3) Although regression can verify a relationship between variables, it cannot quantify the

nature of that relationship.

Answer: False

One of the primary advantages of regression is that it can verify and quantify relationships.

4) Of the two main methods underlying nearly all of mathematical reasoning, statistics

(particularly regression) is used when dealing with quantities that are certain.

Answer: False

Calculus is used when dealing with certain quantities, and statistics (particularly regression)

when dealing with uncertain ones.

5) One limitation to regression is that, due to latent variables, it is hard to know what variable

should predict what.

Answer: True

A latent variable gives rise to two (or more) others that lack an otherwise causal relationship.

This leads to mistakenly basing predictions on the wrong variable.

6) One of the limitations of regression is that it can be used only for linear relationships.

Answer: False

Regression can be used to estimate nonlinear relationships as well.

7) Regression, as a general method, is so versatile, through transformations and special cases,

that it can never be overused.

Answer: False

Regression can be overused, so researchers should use direct, visual “reality checks” that

allow gross errors to be detected.

8) Regression presumes the following theoretical model for the population:

where Y is the dependent variable, the Xs are the independent variables, the bs are the

coefficients of the independent variables, and e is the error.

Answer: True

This is the general theoretical model for regression

9) In regression analysis, the F-test tells you if all of the variables taken together help explain

the variation in the dependent variable.

Answer: True

The F-test measures all of the variables together, while the t-test measures each variable

independently.

10) The t-test is the first thing that should be checked in a regression output. If it is not

significant, then the entire model is not providing sufficient explanatory power.

Answer: False

It is the F-test that should be checked, not the t-test of each independent variable.

11) Standardized residuals are useful in seeing whether there are any strong outliers.

Answer: True

Data points greater than a standardized residual of 3 may be an outlier and should be

examined.

12) A histogram will show if the data points are normally distributed.

Answer: True

A histogram is a frequency distribution, which will show whether or not the data are normally

distributed.

13) Autocorrelation occurs when the error does not have a constant variance.

Answer: False

Autocorrelation is when the error displays obvious patterns. The variance varying for

different combinations of the independent variables is heteroscedasticity.

14) The problem with heteroscedasticity is that researchers tend to be underconfident, i.e.

they must make statements as if the data is worse than it actually is.

Answer: False

The problem is just the reverse; researchers tend to state more certainty of the results than

they should when heteroscedasticity is present.

15) One way to correct for heteroscedasticity is to transform either the independent variables

or the dependent variable using logarithms.

Answer: True

Transformation by logarithms will sometimes correct the problem.

16) The Breusch-Pagan test detects whether data is normally distributed.

Answer: False

The Breusch-Pagan test detects heteroscedasticity is strongly present.

17) Substantial autocorrelation within data can be corrected for with myriad techniques, such

as transformations and dummy variables. It should therefore not call into question the

regression model itself.

Answer: False

Autocorrelation is a very serious problem that can indicate that the model is simply wrong.

18) Each independent coefficient (b) in a regression model will show the impact on the

dependent variable of a one-unit change in the corresponding independent variable, if all of

the other variables are held constant.

Answer: True

If a particular independent variable is changed by one unit, and all others remain the same,

regression will show how much of an impact that one unit has on the dependent variable. The

amount of that impact is signified by the value of that variable’s coefficient (b).

19) Nonlinear relationships can be examine with ordinary linear regression through

transformation functions such as logs, exponents, squares, square roots, and polynomials.

Answer: True

The purpose of these transformations to allow for linear regression to account for more

complex nonlinear relationships.

20)

is the form of a geometric relationship.

Answer: False

is the form of an exponential relationship. The form of a geometric relationship is

.

21) Polynomial regression is a robust method that should be among the first transformations

to try on nonlinear data.

Answer: False

Polynomial regressions, using various powers of X, must be applied very carefully, because

powers of X are highly multicollinear. It is impossible to change X by one unit without also

changing X2, X3, and so forth.

22) If the dependent variable is nominal data, then you would use Poisson regression for the

regression analysis

Answer: False

For nominal dependent data, multinomial regression would be used. Poisson regression is

used for count (non-continuous) data.

23) The value of r2 will always increase when adding additional variables to a multiple linear

regression equation.

Answer: True

Because each new variable will have some error, the r 2 will always increase.

24) For multiple linear regression, researchers want to examine the r2 instead of the adjusted

r2 because the r2 is not as easily fooled by additional independent variables.

Answer: False

It is the opposite; researchers want to examine the adjusted r 2 because it is not as easily

fooled by additional independent variables.

25) In binary regression, it would be inappropriate to put a line through the observations in a

data plot, because the values of the independent variables can only be 0 or 1.

Answer: False

For binary regression, the dependent variable has only two values; one therefore cannot draw

a straight line through the data points.

MULTIPLE CHOICE

1) All of the following statements about regression are true except:

a. Regression is a way to put a line through a group of data points.

b. Regression is a method for testing the validity of relationships.

c. Regression is method of verifying causal relationships.

d. Regression is a flexible methodology for measuring how things influence one another.

Answer: C

Regression can show relationships and how variables influence each other, but it cannot fully

verify causal relationships.

2) All of the following are limitations of regression except

a. regression assumptions are usually violated

b. almost nothing is really linear

c. latent variables may be present

d. insufficient predictor variables

Answer: D

Another limitation of regression is that there are often too many highly correlated predictor

variables.

3) Predicting one dependent variable based on many independent variables is called

a. simple linear regression

b. bivariate regression

c. multiple regression

d. multinomial regression

Answer: C

Multiple regression means there are multiple independent variables in the regression.

4) For linear regression, the assumption is made that the error term is

a. normally distributed with a mean of 1

b. normally distributed with a mean of 0

c. the sum of the standard deviations of all of the β values

d. the sum-of-squares explained by the regression model

Answer: B

For regression analysis, researchers assume that the error term is normally distributed around

a mean of 0.

5) Due to assumptions in linear regression about the distribution and mean of the error, the

only population parameter that will need to be estimated to understand the extent of error is

the

a. autocorrelation of the error

b. standard deviation of the error

c. coefficient of determination

d. coefficient of variation

Answer: B

Because the error is assumed to be normally distributed with a mean of zero, the only error

term needed in the regression equation is the error standard deviation.

6) To determine if a regression model is useful, the

a. fit (or prediction) should be bigger, on average, than “error”

b. coefficients (the β’s) should be clearly different from zero

c. p value should be greater than the F-test value

d. all of the above

e. both a and b

Answer: E

If the fit is greater than the error and the coefficients clearly different from zero, the

regression model will be useful.

7) To determine if the fit of a regression equation is larger than the error, the

_______________ is used.

a. F-test

b. t-test

c. chi-square

d. significance level

Answer: A

For overall model fit, the F-test statistic is used.

8) To determine if the regression coefficients (β’s) in a regression model are significantly

different from zero, the _______________ is used.

a. F-test

b. t-test

c. chi-square

d. significance level

Answer: B

The t-test is used to determine if individual independent variables are significant predictors.

9) Violating the underlying assumptions of regression can lead to all of the following

problems except

a. non-normality

b. standardized residuals

c. heteroscedasticity

d. autocorrelation

Answer: B

Regression can have a number of problems, such as non-normality, heteroscedasticity, and

autocorrelation.

10) Standardized residuals are used to determine if there are any potential outliers in the data.

Data points that are greater that have residuals higher than _______________ in magnitude

should be examined, unless the data set is very large.

a. 1

b. 3

c. 5

d. 10

Answer: B

Standardized residuals are useful in seeing whether there are any strong outliers; it is

reasonable to examine any data point with a standardized residual greater than 3 in

magnitude, unless the data set is very large.

11) _______________ occurs when the error is not normally distributed.

a. Non-normality

b. Heteroscedasticity

c. Autocorrelation

d. Multicollinearity

Answer: A

Non-normality refers to the error term not being normally distributed.

12) _______________ occurs when the error does not have a constant variance.

a. Non-normality

b. Heteroscedasticity

c. Autocorrelation

d. Multicollinearity

Answer: B

Heteroscedasticity refers the error not having a constant variation.

13) _______________ occurs when a number of potential predictors in a regression model

are highly correlated.

a. Non-normality

b. Multicollinearity

c. Heteroscedasticity

d. Autocorrelation

Answer: B

Multicollinearity refers to a situation where a number of predictor variables are highly

correlated.

14) The best way to see if heteroscedasticity is present is to

a. plot the residuals against the dependent variables and examine at them

b. plot the residuals against each of the independent variables and examine them

c. use the Kolmogorov-Smirnov or Shapiro-Wilk test supplied in most computer statistical

programs

d. apply a logarithmic transformation

Answer: B

By plotting the residuals against each independent variable and examining the plot,

researchers can determine if heteroscedasticity is a problem.

15) If the error for one data point can help predict the value of the error in nearby data points,

then the problem is

a. non-normality

b. heteroscedasticity

c. autocorrelation

d. multicollinearity

Answer: C

Autocorrelation occurs when there are patterns in the standard errors or in the location of data

when plotted on a graph.

16) The test for autocorrelation is the

a. Durbin-Watson

b. Kolmogorov-Smirnov

c. Shapiro-Wilk

d. Anderson-Darling

Answer: A

The Durbin-Watson test examines adjacent residuals (errors) to see if there is a pattern.

17) Identifying nonlinear relationships between variables is accomplished by transforming

one or more variables and then estimating the fit with

a. linear regression

b. curvilinear regression

c. multicollinearity analysis

d. a close examination of the residuals plot

Answer: A

After the data has been transformed, linear regression can be used to examine fit, even though

relationship is not linear.

18) Which list of common variables below would be examples of continuous data?

a. price, time, volume, length, sales

b. gender, coupon used, college degree, promotion (yes/no)

c. education level, age group, survey scale response (e.g., 1 to 7)

d. ethnic group, product category, SKU, payment method

Answer: A

Price, time, volume, sales, market share are all examples of continuous variables.

19) Which list of common variables below would be examples of ordinal data?

a. price, time, volume, length, sales

b. gender, coupon used, college degree, promotion (yes/no)

c. education level, age group, survey scale response (e.g., 1 to 7)

d. ethnic group, product category, SKU, payment method

Answer: C

These are ordinal data because they contain a specific order, but equal distance between

groups cannot be assumed.

20) If the dependent variable in a regression model is continuous interval data, the analysis

should employ

a. multiple linear regression

b. binary regression

c. Poisson or count regression

d. multinomial regression

e. ordinal regression

Answer: A

For continuous interval dependent variables, multiple linear regression would be used.

21) In multiple regression, the intercept is the value of Y when

a. all of the independent variables are 0

b. the dependent variables is 0

c. all of the significant independent variables are set to 1

d. all of the coefficients are set to 1

Answer: A

The intercept is the point the regression line intercepts the Y-axis, and all X variables are 0.

22) In binary regression, probability data for predicting binary variables is transformed using

a. an exponential transformation

b. a geometric transformation

c. the logarithm of the odds

d. dummy variables

Answer: C

By using the “log-odds” transformation, it is possible to logistic regression for binary

dependent variables.

23) The error term in a binary regression using a logit model is assumed to be a

a. normal distribution

b. Gumbel distribution

c. bivariate distribution

d. Poisson distribution

Answer: B

The error in a binary regression analysis is assumed to be Gumbel distribution. To assume a

normal distribution, a probit model would need to be used.

24) If the dependent variable is an ordinal scale, then the proper regression model to use is

the

a. linear regression model with geometric transformation of the dependent variable

b. ordered probit model

c. ordered logit model

d. any of the above

e. either b or c

Answer: D

Either the ordered probit model or the ordered logit model can be used.

25) If the dependent variable is a nominal scale, then the proper regression model to use is

a. multinomial regression

b. probit model

c. ordered logit model

d. linear regression after a chi square transformation

e. count regression

Answer: A

Multinomial regression is used when the dependent variable is nominal data.

26) The model used most often in marketing that includes item-specific information is

referred to as the

a. Gumbel Logit model (GLM)

b. Multinomial Logit model (MNL)

c. Guadagni & Little Logit model (GLLM)

d. Kaiser Probit model (KPM)

Answer: B

Within marketing research, this model is called the Multinomial Logit model.

27) _______________ takes both binary and interval data and produces probabilities of an

outcome, such as purchase probabilities and market share projections, as functions of the

marketing mix.

a. The weighted least squares model

b. Logistic regression

c. The Breusch-Pagan model

d. The discrete choice model

Answer: B

Logit models are very effective in producing probabilities of different outcomes.

28) When using rank-ordered data, the regression model that will be used is the

a. multinomial logit regression model

b. discrete choice model

c. exploded logit regression model

d. ordered probit model

Answer: C

For rank-ordered data, the correct approach is the rank-ordered or exploded logit regression

model.

29) If the dependent variable is count data, the regression model that will be used is the

a. multinomial logit regression model

b. discrete choice model

c. exploded logit regression model

d. Poisson regression model

Answer: D

For count data, the Poisson or count regression model is used.

30) In using Poisson regression, the researcher also must know

a. both the observed count and the number of opportunities

b. the relative percentages of each outcome

c. the distribution of the population

d. both the variance and the population mean

Answer: A

In addition to the count, the Poisson or count regression model also needs the number of

opportunities.

Use the regression output below to answer the following questions.

31) Which statistics should one look at to determine which of the independent variables are

most significant in the regression?

a. r2 and adj r2

b. sum of squares and F-test

c. F-test and p

d. t-test and p

Answer: D

The t-test measures the significance of the effect of the individual independent variables, and

the p value (for that same variable) indicates how strong the evidence is in favor of the

coefficient’s not being zero in the population based on one’s sample data.

32) Which variable(s) is/are (a) significant, strong predictor(s) of Weight in the regression

above?

a. Age and Height

b. Gender only

c. Height only

d. Age, Height, Gender, MBA and Year

e. Height and Gender

Answer: E

Gender and Height are significant and strong predictors, because the t-tests are fairly large

and the p-values < 0.00005. Age is significant, but not strongly so, with p = 0.0186.

33) According to the regression output above, which of the following statements most

accurately summarizes what can be said, for the entire population, about the weight of

individuals with an MBA (MBA = 1) versus the weight of individuals without an MBA

(MBA = 0)?

a. They weigh on average 3.122 pounds less than individuals who do not have an MBA.

b. They weigh on average 3.122 pounds more than individuals who do not have an MBA.

c. Their weights are roughly the same.

d. The variable is not a significant predictor, so, even though its coefficient estimate is -3.222,

we cannot assume it has an impact on weight.

Answer: D

The p-value for MBA is 0.3, which is not significant.

34) According to the regression output above, if Age is increased by one year, what will the

impact be on Weight?

a. 0.660 lb. increase

b. 0.279 lb. increase

c. 2.363 lb. increase

d. cannot be determined

Answer: A

According to this model, Weight increases 0.66 pounds for every year of Age. This comes

from the b value (coefficient) for Age.

35) Based on the binary regression output above, which of the independent variable(s) are

significant predictors of Gender?

a. Age and Weight

b. Age and Height

c. Height and Weight

d. Age, Height and Weight

e. all of the variables are significant

f. none of the variables is significant

Answer: False

Age, height, and weight, because the Wald statistic is high, as evidenced by a very small pvalue (no more than .01) in for each of these variables

36) The regression output above indicates that a 1 inch increase in Height increases the

chance a respondent is male by

a. 39.1 percent

b. 8.2 percent

c. 22.669 percent

d. 47.8 percent

Answer: D

It is 47.8%, because the Exp(β) value is 47.8% greater than 1.

37) The regression output above indicates that an additional year in age increases the chance

a respondent is male by

a. 11 percent

b. 17.1 percent

c. 0.61 percent

d. 15.8 percent

Answer: B

It is 17.1%, because the Exp(b) value is 17.1% greater than 1.

38) Based on the output above, the LOGIT model correctly predicted the female gender

a. 87 percent of the time

b. 91.3 percent of the time

c. 89.5 percent of the time

d. cannot be determined

Answer: A

It was correct 161 times, incorrect 24 times, resulting in 87% accuracy.

39) Which independent variable is, taken by itself, the best predictor of Gender?

a. Age

b. Height

c. Weight

d. cannot be determined

Answer: D

It cannot be determined, since we only have the results of a regression containing all these

variables at once. If we ask which seems the most significant in this particular regression, that

would be Weight, based on its large Wald statistic (these can be compared, since there is 1 df

for every predictor variable).

40) What does the Constant signify in this regression?

a. the probability of Gender = 1 when no other variables are changed

b. the average age of males in the sample

c. the average age of females in the sample

d. It does not signify anything useful here.

Answer: D

It would signify the log-odds of the probability that a person is male if the person has no

height, no weight, no MBA, and attended school in the year zero. It is therefore meaningless

in this context and should not be interpreted, even though it must be included in predictions

stemming from the regression.

SHORT ANSWER

Use the regression output below to answer the following questions.

1) Write the regression equation for the regression output above.

Answer: Weight = -210.603 + 0.660(Age) + 17.449 (Gender) + 4.999 (Height) – 3.122

(MBA) – 0.111 (Year) + ε

[The MBA and Year terms need to appear, even though they are not significant, because

leaving them out might change other coefficients. The regression should be run again without

them, but that is not part of the formal answer.]

2) Given that Gender was coded female = 0 and male = 1, on the average, how much more do

males weight than females, “correcting for” (i.e., including the effects of) all the other

variables in the regression?

Answer: The beta value for Gender is 17.449, indicating that on average men weigh 17.449

more pounds than women, when we systematically correct for differences in their ages,

heights, MBA, and year variables.

[We cannot say how much men and women differ in weight overall without more information

(but this is not what the question asks).]

3) For each additional inch of Height, how much additional Weight would be added?

Answer: 4.999 pounds, the beta value for Height.

[Note that this value is the same for both genders, people of all ages, etc., as assumptions of

this regression model.]

4) Based on the regression printout, what would be the predicted (mean) weight of a female,

20 years old, 68 inches tall, without an MBA, and in Year 1.

Answer: 142.4

calculation:

Weight = -210.603 + (0.66 × 20) + (0 × 17.44) + (68 × 4.999) – (3.122 × 0) – (0.111 × 1)

ESSAY

1) Describe what regression is and what it is used for.

Answer: Regression is:

1) a way to put a line through a group of points

2) a method for testing the validity of relationships

3) a flexible methodology for measuring how things influence one another

4) a scientific approach to forecasting and prediction

2) Discuss its limitations of regression.

Answer: The limitations of regression are:

1) regression assumptions are usually violated in some way

2) sometimes, there are too many predictor variables and they are highly correlated

3) it is hard to know “what should predict what”

4) almost nothing is really linear

5) regression requires a certain type of data and not all data are like that

6) regression can be overused

3) Explain the problems of non-normality, heteroscedasticity, and autocorrelation and how

researchers can detect each.

Answer: 1) Non-normality occurs when the errors are not normally distributed. It is detected

by inspection of a histogram (frequency distribution) of residuals (errors).

2) Heteroscedasticity is when the residuals do not have a constant variance, creating a greater

spread of data points around some parts of the regression line than others. The best way to see

if heteroscedasticity is present is to closely inspect plots of the residuals against each of the

independent variables. The White and Breusch-Pagan tests are means of detecting it through

a computer program.

3) Autocorrelation occurs when there are meaningful patterns in the error. To detect this, one

checks whether knowing the exact value of the error for one data point predicts anything at

all about the error value at another point?

4) Describe how transformations can be used to apply linear regression to nonlinear

relationships.

Answer: Transformations are used on variables that are not linearly related to arrive at a

linear relationship between the dependent variable and the transformed variables. If Y is not

linearly related to X, researchers might try logarithmic, exponential, or geometric

relationships. For example:

5) Describe the different types of analysis for the different data types of dependent variable.

Answer: · For continuous or interval data, multiple linear regression is used.

· For binary data, binary regression is used (and/or logistic/logit or probit).

· For ordinal data, ordinal regression is used (and/or ordered logit or ordered probit).

· For nominal data, multinomial regression is used (and/or multinomial logit or discrete

choice).

· For rank-ordered data, rank-ordered or “exploded” regression is used.

· For count data, Poisson or count regression is used.

Test Bank for Modern Marketing Research: Concepts, Methods, and Cases

Fred M. Feinberg, Thomas Kinnear, James R. Taylor

9781133188964, 9781133191025, 9780759391710