Preview (6 of 18 pages)

Preview Extract

Chapter 9: Multiple Regression: Modeling Multivariate Relationships
TRUE/FALSE
1) Regression is the dominant method of data analysis throughout the natural and social
sciences.
Answer: True
Regression is the most commonly used method of data analysis.
2) All regression models, including simple linear, binary, ordinal, multinomial logit, rankordered, and count, can be viewed as special cases of the general formulation called the
General Linear Model.
Answer: True
One can treat all types of regression as if they were really one grand method, the General
Linear Model. Some advanced statistical programs require that all regression models be
viewed as special cases of this general formulation.
3) Although regression can verify a relationship between variables, it cannot quantify the
nature of that relationship.
Answer: False
One of the primary advantages of regression is that it can verify and quantify relationships.
4) Of the two main methods underlying nearly all of mathematical reasoning, statistics
(particularly regression) is used when dealing with quantities that are certain.
Answer: False
Calculus is used when dealing with certain quantities, and statistics (particularly regression)
when dealing with uncertain ones.
5) One limitation to regression is that, due to latent variables, it is hard to know what variable
should predict what.
Answer: True
A latent variable gives rise to two (or more) others that lack an otherwise causal relationship.
This leads to mistakenly basing predictions on the wrong variable.
6) One of the limitations of regression is that it can be used only for linear relationships.
Answer: False
Regression can be used to estimate nonlinear relationships as well.
7) Regression, as a general method, is so versatile, through transformations and special cases,
that it can never be overused.
Answer: False
Regression can be overused, so researchers should use direct, visual “reality checks” that
allow gross errors to be detected.

8) Regression presumes the following theoretical model for the population:

where Y is the dependent variable, the Xs are the independent variables, the bs are the
coefficients of the independent variables, and e is the error.
Answer: True
This is the general theoretical model for regression
9) In regression analysis, the F-test tells you if all of the variables taken together help explain
the variation in the dependent variable.
Answer: True
The F-test measures all of the variables together, while the t-test measures each variable
independently.
10) The t-test is the first thing that should be checked in a regression output. If it is not
significant, then the entire model is not providing sufficient explanatory power.
Answer: False
It is the F-test that should be checked, not the t-test of each independent variable.
11) Standardized residuals are useful in seeing whether there are any strong outliers.
Answer: True
Data points greater than a standardized residual of 3 may be an outlier and should be
examined.
12) A histogram will show if the data points are normally distributed.
Answer: True
A histogram is a frequency distribution, which will show whether or not the data are normally
distributed.
13) Autocorrelation occurs when the error does not have a constant variance.
Answer: False
Autocorrelation is when the error displays obvious patterns. The variance varying for
different combinations of the independent variables is heteroscedasticity.
14) The problem with heteroscedasticity is that researchers tend to be underconfident, i.e.
they must make statements as if the data is worse than it actually is.
Answer: False
The problem is just the reverse; researchers tend to state more certainty of the results than
they should when heteroscedasticity is present.
15) One way to correct for heteroscedasticity is to transform either the independent variables
or the dependent variable using logarithms.

Answer: True
Transformation by logarithms will sometimes correct the problem.
16) The Breusch-Pagan test detects whether data is normally distributed.
Answer: False
The Breusch-Pagan test detects heteroscedasticity is strongly present.
17) Substantial autocorrelation within data can be corrected for with myriad techniques, such
as transformations and dummy variables. It should therefore not call into question the
regression model itself.
Answer: False
Autocorrelation is a very serious problem that can indicate that the model is simply wrong.
18) Each independent coefficient (b) in a regression model will show the impact on the
dependent variable of a one-unit change in the corresponding independent variable, if all of
the other variables are held constant.
Answer: True
If a particular independent variable is changed by one unit, and all others remain the same,
regression will show how much of an impact that one unit has on the dependent variable. The
amount of that impact is signified by the value of that variable’s coefficient (b).
19) Nonlinear relationships can be examine with ordinary linear regression through
transformation functions such as logs, exponents, squares, square roots, and polynomials.
Answer: True
The purpose of these transformations to allow for linear regression to account for more
complex nonlinear relationships.
20)

is the form of a geometric relationship.

Answer: False
is the form of an exponential relationship. The form of a geometric relationship is
.
21) Polynomial regression is a robust method that should be among the first transformations
to try on nonlinear data.
Answer: False
Polynomial regressions, using various powers of X, must be applied very carefully, because
powers of X are highly multicollinear. It is impossible to change X by one unit without also
changing X2, X3, and so forth.
22) If the dependent variable is nominal data, then you would use Poisson regression for the
regression analysis

Answer: False
For nominal dependent data, multinomial regression would be used. Poisson regression is
used for count (non-continuous) data.
23) The value of r2 will always increase when adding additional variables to a multiple linear
regression equation.
Answer: True
Because each new variable will have some error, the r 2 will always increase.
24) For multiple linear regression, researchers want to examine the r2 instead of the adjusted
r2 because the r2 is not as easily fooled by additional independent variables.
Answer: False
It is the opposite; researchers want to examine the adjusted r 2 because it is not as easily
fooled by additional independent variables.
25) In binary regression, it would be inappropriate to put a line through the observations in a
data plot, because the values of the independent variables can only be 0 or 1.
Answer: False
For binary regression, the dependent variable has only two values; one therefore cannot draw
a straight line through the data points.
MULTIPLE CHOICE
1) All of the following statements about regression are true except:
a. Regression is a way to put a line through a group of data points.
b. Regression is a method for testing the validity of relationships.
c. Regression is method of verifying causal relationships.
d. Regression is a flexible methodology for measuring how things influence one another.
Answer: C
Regression can show relationships and how variables influence each other, but it cannot fully
verify causal relationships.
2) All of the following are limitations of regression except
a. regression assumptions are usually violated
b. almost nothing is really linear
c. latent variables may be present
d. insufficient predictor variables
Answer: D

Another limitation of regression is that there are often too many highly correlated predictor
variables.
3) Predicting one dependent variable based on many independent variables is called
a. simple linear regression
b. bivariate regression
c. multiple regression
d. multinomial regression
Answer: C
Multiple regression means there are multiple independent variables in the regression.
4) For linear regression, the assumption is made that the error term is
a. normally distributed with a mean of 1
b. normally distributed with a mean of 0
c. the sum of the standard deviations of all of the β values
d. the sum-of-squares explained by the regression model
Answer: B
For regression analysis, researchers assume that the error term is normally distributed around
a mean of 0.
5) Due to assumptions in linear regression about the distribution and mean of the error, the
only population parameter that will need to be estimated to understand the extent of error is
the
a. autocorrelation of the error
b. standard deviation of the error
c. coefficient of determination
d. coefficient of variation
Answer: B
Because the error is assumed to be normally distributed with a mean of zero, the only error
term needed in the regression equation is the error standard deviation.
6) To determine if a regression model is useful, the
a. fit (or prediction) should be bigger, on average, than “error”
b. coefficients (the β’s) should be clearly different from zero
c. p value should be greater than the F-test value
d. all of the above

e. both a and b
Answer: E
If the fit is greater than the error and the coefficients clearly different from zero, the
regression model will be useful.
7) To determine if the fit of a regression equation is larger than the error, the
_______________ is used.
a. F-test
b. t-test
c. chi-square
d. significance level
Answer: A
For overall model fit, the F-test statistic is used.
8) To determine if the regression coefficients (β’s) in a regression model are significantly
different from zero, the _______________ is used.
a. F-test
b. t-test
c. chi-square
d. significance level
Answer: B
The t-test is used to determine if individual independent variables are significant predictors.
9) Violating the underlying assumptions of regression can lead to all of the following
problems except
a. non-normality
b. standardized residuals
c. heteroscedasticity
d. autocorrelation
Answer: B
Regression can have a number of problems, such as non-normality, heteroscedasticity, and
autocorrelation.
10) Standardized residuals are used to determine if there are any potential outliers in the data.
Data points that are greater that have residuals higher than _______________ in magnitude
should be examined, unless the data set is very large.
a. 1

b. 3
c. 5
d. 10
Answer: B
Standardized residuals are useful in seeing whether there are any strong outliers; it is
reasonable to examine any data point with a standardized residual greater than 3 in
magnitude, unless the data set is very large.
11) _______________ occurs when the error is not normally distributed.
a. Non-normality
b. Heteroscedasticity
c. Autocorrelation
d. Multicollinearity
Answer: A
Non-normality refers to the error term not being normally distributed.
12) _______________ occurs when the error does not have a constant variance.
a. Non-normality
b. Heteroscedasticity
c. Autocorrelation
d. Multicollinearity
Answer: B
Heteroscedasticity refers the error not having a constant variation.
13) _______________ occurs when a number of potential predictors in a regression model
are highly correlated.
a. Non-normality
b. Multicollinearity
c. Heteroscedasticity
d. Autocorrelation
Answer: B
Multicollinearity refers to a situation where a number of predictor variables are highly
correlated.
14) The best way to see if heteroscedasticity is present is to
a. plot the residuals against the dependent variables and examine at them

b. plot the residuals against each of the independent variables and examine them
c. use the Kolmogorov-Smirnov or Shapiro-Wilk test supplied in most computer statistical
programs
d. apply a logarithmic transformation
Answer: B
By plotting the residuals against each independent variable and examining the plot,
researchers can determine if heteroscedasticity is a problem.
15) If the error for one data point can help predict the value of the error in nearby data points,
then the problem is
a. non-normality
b. heteroscedasticity
c. autocorrelation
d. multicollinearity
Answer: C
Autocorrelation occurs when there are patterns in the standard errors or in the location of data
when plotted on a graph.
16) The test for autocorrelation is the
a. Durbin-Watson
b. Kolmogorov-Smirnov
c. Shapiro-Wilk
d. Anderson-Darling
Answer: A
The Durbin-Watson test examines adjacent residuals (errors) to see if there is a pattern.
17) Identifying nonlinear relationships between variables is accomplished by transforming
one or more variables and then estimating the fit with
a. linear regression
b. curvilinear regression
c. multicollinearity analysis
d. a close examination of the residuals plot
Answer: A
After the data has been transformed, linear regression can be used to examine fit, even though
relationship is not linear.

18) Which list of common variables below would be examples of continuous data?
a. price, time, volume, length, sales
b. gender, coupon used, college degree, promotion (yes/no)
c. education level, age group, survey scale response (e.g., 1 to 7)
d. ethnic group, product category, SKU, payment method
Answer: A
Price, time, volume, sales, market share are all examples of continuous variables.
19) Which list of common variables below would be examples of ordinal data?
a. price, time, volume, length, sales
b. gender, coupon used, college degree, promotion (yes/no)
c. education level, age group, survey scale response (e.g., 1 to 7)
d. ethnic group, product category, SKU, payment method
Answer: C
These are ordinal data because they contain a specific order, but equal distance between
groups cannot be assumed.
20) If the dependent variable in a regression model is continuous interval data, the analysis
should employ
a. multiple linear regression
b. binary regression
c. Poisson or count regression
d. multinomial regression
e. ordinal regression
Answer: A
For continuous interval dependent variables, multiple linear regression would be used.
21) In multiple regression, the intercept is the value of Y when
a. all of the independent variables are 0
b. the dependent variables is 0
c. all of the significant independent variables are set to 1
d. all of the coefficients are set to 1
Answer: A
The intercept is the point the regression line intercepts the Y-axis, and all X variables are 0.

22) In binary regression, probability data for predicting binary variables is transformed using
a. an exponential transformation
b. a geometric transformation
c. the logarithm of the odds
d. dummy variables
Answer: C
By using the “log-odds” transformation, it is possible to logistic regression for binary
dependent variables.
23) The error term in a binary regression using a logit model is assumed to be a
a. normal distribution
b. Gumbel distribution
c. bivariate distribution
d. Poisson distribution
Answer: B
The error in a binary regression analysis is assumed to be Gumbel distribution. To assume a
normal distribution, a probit model would need to be used.
24) If the dependent variable is an ordinal scale, then the proper regression model to use is
the
a. linear regression model with geometric transformation of the dependent variable
b. ordered probit model
c. ordered logit model
d. any of the above
e. either b or c
Answer: D
Either the ordered probit model or the ordered logit model can be used.
25) If the dependent variable is a nominal scale, then the proper regression model to use is
a. multinomial regression
b. probit model
c. ordered logit model
d. linear regression after a chi square transformation
e. count regression

Answer: A
Multinomial regression is used when the dependent variable is nominal data.
26) The model used most often in marketing that includes item-specific information is
referred to as the
a. Gumbel Logit model (GLM)
b. Multinomial Logit model (MNL)
c. Guadagni & Little Logit model (GLLM)
d. Kaiser Probit model (KPM)
Answer: B
Within marketing research, this model is called the Multinomial Logit model.
27) _______________ takes both binary and interval data and produces probabilities of an
outcome, such as purchase probabilities and market share projections, as functions of the
marketing mix.
a. The weighted least squares model
b. Logistic regression
c. The Breusch-Pagan model
d. The discrete choice model
Answer: B
Logit models are very effective in producing probabilities of different outcomes.
28) When using rank-ordered data, the regression model that will be used is the
a. multinomial logit regression model
b. discrete choice model
c. exploded logit regression model
d. ordered probit model
Answer: C
For rank-ordered data, the correct approach is the rank-ordered or exploded logit regression
model.
29) If the dependent variable is count data, the regression model that will be used is the
a. multinomial logit regression model
b. discrete choice model
c. exploded logit regression model
d. Poisson regression model

Answer: D
For count data, the Poisson or count regression model is used.
30) In using Poisson regression, the researcher also must know
a. both the observed count and the number of opportunities
b. the relative percentages of each outcome
c. the distribution of the population
d. both the variance and the population mean
Answer: A
In addition to the count, the Poisson or count regression model also needs the number of
opportunities.
Use the regression output below to answer the following questions.

31) Which statistics should one look at to determine which of the independent variables are
most significant in the regression?
a. r2 and adj r2
b. sum of squares and F-test
c. F-test and p
d. t-test and p
Answer: D

The t-test measures the significance of the effect of the individual independent variables, and
the p value (for that same variable) indicates how strong the evidence is in favor of the
coefficient’s not being zero in the population based on one’s sample data.
32) Which variable(s) is/are (a) significant, strong predictor(s) of Weight in the regression
above?
a. Age and Height
b. Gender only
c. Height only
d. Age, Height, Gender, MBA and Year
e. Height and Gender
Answer: E
Gender and Height are significant and strong predictors, because the t-tests are fairly large
and the p-values < 0.00005. Age is significant, but not strongly so, with p = 0.0186.
33) According to the regression output above, which of the following statements most
accurately summarizes what can be said, for the entire population, about the weight of
individuals with an MBA (MBA = 1) versus the weight of individuals without an MBA
(MBA = 0)?
a. They weigh on average 3.122 pounds less than individuals who do not have an MBA.
b. They weigh on average 3.122 pounds more than individuals who do not have an MBA.
c. Their weights are roughly the same.
d. The variable is not a significant predictor, so, even though its coefficient estimate is -3.222,
we cannot assume it has an impact on weight.
Answer: D
The p-value for MBA is 0.3, which is not significant.
34) According to the regression output above, if Age is increased by one year, what will the
impact be on Weight?
a. 0.660 lb. increase
b. 0.279 lb. increase
c. 2.363 lb. increase
d. cannot be determined
Answer: A
According to this model, Weight increases 0.66 pounds for every year of Age. This comes
from the b value (coefficient) for Age.

35) Based on the binary regression output above, which of the independent variable(s) are
significant predictors of Gender?
a. Age and Weight
b. Age and Height
c. Height and Weight
d. Age, Height and Weight
e. all of the variables are significant
f. none of the variables is significant
Answer: False
Age, height, and weight, because the Wald statistic is high, as evidenced by a very small pvalue (no more than .01) in for each of these variables
36) The regression output above indicates that a 1 inch increase in Height increases the
chance a respondent is male by
a. 39.1 percent
b. 8.2 percent
c. 22.669 percent
d. 47.8 percent
Answer: D
It is 47.8%, because the Exp(β) value is 47.8% greater than 1.

37) The regression output above indicates that an additional year in age increases the chance
a respondent is male by
a. 11 percent
b. 17.1 percent
c. 0.61 percent
d. 15.8 percent
Answer: B
It is 17.1%, because the Exp(b) value is 17.1% greater than 1.
38) Based on the output above, the LOGIT model correctly predicted the female gender
a. 87 percent of the time
b. 91.3 percent of the time
c. 89.5 percent of the time
d. cannot be determined
Answer: A
It was correct 161 times, incorrect 24 times, resulting in 87% accuracy.
39) Which independent variable is, taken by itself, the best predictor of Gender?
a. Age
b. Height
c. Weight
d. cannot be determined
Answer: D
It cannot be determined, since we only have the results of a regression containing all these
variables at once. If we ask which seems the most significant in this particular regression, that
would be Weight, based on its large Wald statistic (these can be compared, since there is 1 df
for every predictor variable).
40) What does the Constant signify in this regression?
a. the probability of Gender = 1 when no other variables are changed
b. the average age of males in the sample
c. the average age of females in the sample
d. It does not signify anything useful here.
Answer: D

It would signify the log-odds of the probability that a person is male if the person has no
height, no weight, no MBA, and attended school in the year zero. It is therefore meaningless
in this context and should not be interpreted, even though it must be included in predictions
stemming from the regression.
SHORT ANSWER
Use the regression output below to answer the following questions.

1) Write the regression equation for the regression output above.
Answer: Weight = -210.603 + 0.660(Age) + 17.449 (Gender) + 4.999 (Height) – 3.122
(MBA) – 0.111 (Year) + ε
[The MBA and Year terms need to appear, even though they are not significant, because
leaving them out might change other coefficients. The regression should be run again without
them, but that is not part of the formal answer.]
2) Given that Gender was coded female = 0 and male = 1, on the average, how much more do
males weight than females, “correcting for” (i.e., including the effects of) all the other
variables in the regression?
Answer: The beta value for Gender is 17.449, indicating that on average men weigh 17.449
more pounds than women, when we systematically correct for differences in their ages,
heights, MBA, and year variables.
[We cannot say how much men and women differ in weight overall without more information
(but this is not what the question asks).]
3) For each additional inch of Height, how much additional Weight would be added?

Answer: 4.999 pounds, the beta value for Height.
[Note that this value is the same for both genders, people of all ages, etc., as assumptions of
this regression model.]
4) Based on the regression printout, what would be the predicted (mean) weight of a female,
20 years old, 68 inches tall, without an MBA, and in Year 1.
Answer: 142.4
calculation:
Weight = -210.603 + (0.66 × 20) + (0 × 17.44) + (68 × 4.999) – (3.122 × 0) – (0.111 × 1)
ESSAY
1) Describe what regression is and what it is used for.
Answer: Regression is:
1) a way to put a line through a group of points
2) a method for testing the validity of relationships
3) a flexible methodology for measuring how things influence one another
4) a scientific approach to forecasting and prediction
2) Discuss its limitations of regression.
Answer: The limitations of regression are:
1) regression assumptions are usually violated in some way
2) sometimes, there are too many predictor variables and they are highly correlated
3) it is hard to know “what should predict what”
4) almost nothing is really linear
5) regression requires a certain type of data and not all data are like that
6) regression can be overused
3) Explain the problems of non-normality, heteroscedasticity, and autocorrelation and how
researchers can detect each.
Answer: 1) Non-normality occurs when the errors are not normally distributed. It is detected
by inspection of a histogram (frequency distribution) of residuals (errors).
2) Heteroscedasticity is when the residuals do not have a constant variance, creating a greater
spread of data points around some parts of the regression line than others. The best way to see
if heteroscedasticity is present is to closely inspect plots of the residuals against each of the
independent variables. The White and Breusch-Pagan tests are means of detecting it through
a computer program.

3) Autocorrelation occurs when there are meaningful patterns in the error. To detect this, one
checks whether knowing the exact value of the error for one data point predicts anything at
all about the error value at another point?
4) Describe how transformations can be used to apply linear regression to nonlinear
relationships.
Answer: Transformations are used on variables that are not linearly related to arrive at a
linear relationship between the dependent variable and the transformed variables. If Y is not
linearly related to X, researchers might try logarithmic, exponential, or geometric
relationships. For example:

5) Describe the different types of analysis for the different data types of dependent variable.
Answer: · For continuous or interval data, multiple linear regression is used.
· For binary data, binary regression is used (and/or logistic/logit or probit).
· For ordinal data, ordinal regression is used (and/or ordered logit or ordered probit).
· For nominal data, multinomial regression is used (and/or multinomial logit or discrete
choice).
· For rank-ordered data, rank-ordered or “exploded” regression is used.
· For count data, Poisson or count regression is used.

Test Bank for Modern Marketing Research: Concepts, Methods, and Cases
Fred M. Feinberg, Thomas Kinnear, James R. Taylor
9781133188964, 9781133191025, 9780759391710

Document Details

Related Documents

Close

Send listing report

highlight_off

You already reported this listing

The report is private and won't be shared with the owner

rotate_right
Close
rotate_right
Close

Send Message

image
Close

My favorites

image
Close

Application Form

image
Notifications visibility rotate_right Clear all Close close
image
image
arrow_left
arrow_right