Description

Please work on all the questions and provide the rationale or show the work when necessary.

Q2. Please fill in each blank by selecting the answer from the Dropdown:

a. When you have 1 million observations and there are only 2 predictors, it is usually better to use [ Select ] [“flexible”, “inflexible”]statistical learning.

b. When the variance of the error term is very large, it is usually better to use [ Select ] [“flexible”, “inflexible”]statistical learning.

c. When the relationship between the DV and IVs is linear, it is usually better to use [ Select ] [“flexible”, “inflexible”]statistical learning.

Q3. Select True/False for each of the following:

a. A fitted value at an observation point for a linear regression model is a linear combination of the observed response values. [ Select ] [“True”, “False”]

b. In a simple linear regression, the least square regression line may not go through the point (). [ Select ] [“True”, “False”]

c. For a simple linear regression, R^2 is the squared correlation between the DV and the IV. [ Select ] [“True”, “False”]

d. Bootstrap is a resampling method with replacement. [ Select ] [“False”, “True”]

Q4. Let be the original sample. Suppose that we obtain a bootstrap sample from this original sample with n observations.

(i) the probability that the 2nd bootstrap observation is is___________

(ii) the probability that the 3rd bootstrap observation is is___________

(iii) the probability that is not in the bootstrap sample is___________

(iv) the probability that is in the bootstrap sample is___________

(v) the probability that the bootstrap sample is (all bootstrap observations are ) is___________

Q5. Suppose that you wish to invest a fixed sum of money in two financial assets that yield returns of X and Y, respectively, where X and Y are random quantities. You invest some percent of your money in X and the remaining in Y. In general, you would like to [ Select ] [“minimize”, “maximize”] the expected return. Since there is variability associated with the returns on these two assets, you may need to [ Select ] [“maximize”, “minimize”] the variance of the investment.

Q6. Identify the predictor variable and the response variable in each of the following situations:

(a) A training director wishes to study the relationship between the duration of training for new recruits and their performance in a skilled job.

Predictor variable:

Response variable:

(b) A market analyst wished to relate the expenditures incurred in promoting a product in test markets and the subsequent amount of product sales.

Predictor variable:

Response variable:

(c) The aim of a study is to relate the carbon monoxide level in blood samples from smokers with the average number of cigarettes they smoke per day.

Predictor variable:

Response variable:

Q7. Suppose you have a simple linear regression model as below:

where is a normal random variable with mean 0 and standard deviation 2.

(a) Identify the values of the parameters , and in the statistical model:

= _________

= _________

= _________

(b) What will be expected value of Y when X=5?

Q8. Which of the following scenario is NOT a classification problem?

( ) We are considering launching a new product and wish to know the required marketing budget to generate the expected amount of sales, based on 20 similar products previously launched.

( ) We want to predict whether an email is a spam and should be delivered to the Junk folder.

( ) We want to identify the handwritten single-digit number from an image.

( ) We are considering launching a new product and wish to know whether it will be a success or a failure, based on 20 similar products previously launched.

Q9. Identify the sample size n and # of predictors p in each of the following scenario:

We conducted a survey with 286 participants responded, to understand how burnout is related to gender, age, education level, fatigue, income, family status, amount of exercise, and health condition.

Sample size n = _______

# of predictors p = ___________

Q10. Suppose we collect data for a group of students in a statistics class with variables X1 = hours studied, X2 = undergrad GPA, and Y = receive an A. We fit a logistic regression and produce estimated coefficient, =−5, = 0.06, = 0.95

(a) Estimate the probability that a student who studies for 20 hour and has an undergrad GPA of 4.0 gets an A in the class. (keep four decimal places)

the probability = ________

(b) How many hours would the student with undergrad GPA of 4.0 need to study to have a 90% chance of getting an A in the class? (keep two decimal places)

______ hours would be needed.

Q11. Match the curves below:

Orange Curve B

[ Choose ] Bias Squared Variance Bayes Error/Irreducible Error Test Error

Blue Curve D

[ Choose ] Bias Squared Variance Bayes Error/Irreducible Error Test Error

Red Curve A

[ Choose ] Bias Squared Variance Bayes Error/Irreducible Error Test Error

Purple Curve C

[ Choose ] Bias Squared Variance Bayes Error/Irreducible Error Test Error

Q12. Below is a table of outputs from running a linear regression:

Based on the outputs, answer the following questions by filling the blanks.

(a) What is the estimate of coefficient for the predictor “radio”?

(b) How do you interpret the estimate in (a)?

(c) Which predictor is not significant when the other predictors are included in the model?

Q13. Below is the output from running a linear regression model:

What % of variation from the response variable is explained by this regression model?

____________

Q14. Below is partial output from running a linear regression model:

To improve the model, we may take away one of the predictors from the model. Which predictor should be removed to improve the model? _____________

Q15. From the boxplots below:

what can you conclude? (choose the best answer)

( ) Both “Balance” and “Income” impact “Default” significantly

( ) can’t tell

( )”Balance” does not impact “Default” significantly

( ) Neither “Balance” nor “Income” impacts “Default” significantly

( ) “Balance” impacts “Default” significantly

Q16. Below is the confusion matrix from a classification model:

Predicted positive

Predicted negative

Actual positive

70

2

Actual negative

8

20

(a) What is the overall accuracy of the prediction (in %, with two decimal places)?

(b) What is the overall error rate (in %, with two decimal places)?

(c) What is the specificity (in %, with two decimal places)?

(d) What is the sensitivity (in %, with two decimal places)?

Q17. When you run multiple Logistic Regression models, which of the following is not a good measure for model selection/assessment?

( ) Sensitivity

( ) Accuracy

( ) Specificity

( ) Error rate

( ) Split percentage for training and testing

Q18. Suppose we have a data set with five predictors,

X1 = GPA,

X2 = IQ,

X3 = Gender (1 for Female and 0 for Male),

X4 = Interaction between GPA and IQ, and

X5 = Interaction between GPA and Gender.

The response is starting salary after graduation (in thousands of dollars). Suppose we use least squares to fit the model,

(keep one decimal place)

(a) Predict the salary of a female with IQ of 120 and a GPA of 4.0 _______________

(b) Predict the salary of a male with IQ of 120 and a GPA of 4.0 ________________

Unformatted Attachment Preview

Q2. Please fill in each blank by selecting the answer from the Dropdown:
a. When you have 1 million observations and there are only 2 predictors, it is usually
better to use [ Select ]
[“flexible”, “inflexible”] statistical learning.
b. When the variance of the error term is very large, it is usually better to
use [ Select ]
[“flexible”, “inflexible”] statistical learning.
c. When the relationship between the DV and IVs is linear, it is usually better to
use [ Select ]
[“flexible”, “inflexible”] statistical learning.
Q3. Select True/False for each of the following:
a. A fitted value at an observation point for a linear regression model is a linear
combination of the observed response values.
[ Select ]
[“True”, “False”]
b. In a simple linear regression, the least square regression line may not go through the
point ( ̅ , ̅ ).
[ Select ]
[“True”, “False”]
c. For a simple linear regression, R^2 is the squared correlation between the DV and the
IV.
[ Select ]
[“True”, “False”]
d. Bootstrap is a resampling method with replacement.
[ Select ]
[“False”, “True”]
Q4. Let 1 , 2 , … , be the original sample. Suppose that we obtain a bootstrap sample
from this original sample with n observations.
(i) the probability that the 2nd bootstrap observation is 2 is ___________
(ii) the probability that the 3rd bootstrap observation is 2 is ___________
(iii) the probability that 2 is not in the bootstrap sample is ___________
(iv) the probability that 2 is in the bootstrap sample is ___________
(v) the probability that the bootstrap sample is { 2 , 2 , … , 2 }(all bootstrap observations
are 2 ) is ___________
Q5. Suppose that you wish to invest a fixed sum of money in two financial assets that
yield returns of X and Y, respectively, where X and Y are random quantities. You invest
some percent of your money in X and the remaining in Y. In general, you would like
to
[ Select ] [“minimize”, “maximize”] the expected return. Since there is variability
associated with the returns on these two assets, you may need
to
[ Select ] [“maximize”, “minimize”] the variance of the investment.
Q6. Identify the predictor variable and the response variable in each of the following
situations:
(a) A training director wishes to study the relationship between the duration of training
for new recruits and their performance in a skilled job.
Predictor variable:
Response variable:
(b) A market analyst wished to relate the expenditures incurred in promoting a product in
test markets and the subsequent amount of product sales.
Predictor variable:
Response variable:
(c) The aim of a study is to relate the carbon monoxide level in blood samples from
smokers with the average number of cigarettes they smoke per day.
Predictor variable:
Response variable:
Q7. Suppose you have a simple linear regression model as below:
=3− +
where is a normal random variable with mean 0 and standard deviation 2.
(a) Identify the values of the parameters 0 , 1, and in the statistical model:
0 = _________
1 = _________
= _________
(b) What will be expected value of Y when X=5?
Q8. Which of the following scenario is NOT a classification problem?
( ) We are considering launching a new product and wish to know the required marketing
budget to generate the expected amount of sales, based on 20 similar products
previously launched.
( ) We want to predict whether an email is a spam and should be delivered to the Junk
folder.
( ) We want to identify the handwritten single-digit number from an image.
( ) We are considering launching a new product and wish to know whether it will be a
success or a failure, based on 20 similar products previously launched.
Q9. Identify the sample size n and # of predictors p in each of the following scenario:
We conducted a survey with 286 participants responded, to understand how burnout is
related to gender, age, education level, fatigue, income, family status, amount of
exercise, and health condition.
Sample size n = _______
# of predictors p = ___________
Q10. Suppose we collect data for a group of students in a statistics class with variables
X1 = hours studied, X2 = undergrad GPA, and Y = receive an A. We fit a logistic
regression and produce estimated coefficient, ̂0 =−5, ̂1 = 0.06, ̂2 = 0.95
(a) Estimate the probability that a student who studies for 20 hour and has an undergrad
GPA of 4.0 gets an A in the class. (keep four decimal places)
the probability = ________
(b) How many hours would the student with undergrad GPA of 4.0 need to study to
have a 90% chance of getting an A in the class? (keep two decimal places)
______ hours would be needed.
Q11. Match the curves below:
Orange Curve B
[ Choose ]
Bias Squared
Error
Test Error
Blue Curve D
[ Choose ]
Bias Squared
Error
Test Error
Red Curve A
[ Choose ]
Bias Squared
Error
Test Error
Purple Curve C
[ Choose ]
Bias Squared
Error
Test Error
Variance
Bayes Error/Irreducible
Variance
Bayes Error/Irreducible
Variance
Bayes Error/Irreducible
Variance
Bayes Error/Irreducible
Q12. Below is a table of outputs from running a linear regression:
Based on the outputs, answer the following questions by filling the blanks.
(a) What is the estimate of coefficient for the predictor “radio”?
(b) How do you interpret the estimate in (a)?
(c) Which predictor is not significant when the other predictors are included in the
model?
Q13. Below is the output from running a linear regression model:
What % of variation from the response variable is explained by this regression model?
____________
Q14. Below is partial output from running a linear regression model:
To improve the model, we may take away one of the predictors from the model. Which
predictor should be removed to improve the model? _____________
Q15. From the boxplots below:
what can you conclude? (choose the best answer)
( ) Both “Balance” and “Income” impact “Default” significantly
( ) can’t tell
( )”Balance” does not impact “Default” significantly
( ) Neither “Balance” nor “Income” impacts “Default” significantly
( ) “Balance” impacts “Default” significantly
Q16. Below is the confusion matrix from a classification model:
Predicted positive
Predicted
negative
Actual positive
70
2
Actual negative
8
20
(a) What is the overall accuracy of the prediction (in %, with two decimal places)?
(b) What is the overall error rate (in %, with two decimal places)?
(c) What is the specificity (in %, with two decimal places)?
(d) What is the sensitivity (in %, with two decimal places)?
Q17. When you run multiple Logistic Regression models, which of the following is not a
good measure for model selection/assessment?
( ) Sensitivity
( ) Accuracy
( ) Specificity
( ) Error rate
( ) Split percentage for training and testing
Q18. Suppose we have a data set with five predictors,
X1 = GPA,
X2 = IQ,
X3 = Gender (1 for Female and 0 for Male),
X4 = Interaction between GPA and IQ, and
X5 = Interaction between GPA and Gender.
The response is starting salary after graduation (in thousands of dollars). Suppose we use
least squares to fit the model,
(keep one decimal place)
(a) Predict the salary of a female with IQ of 120 and a GPA of 4.0 _______________
(b) Predict the salary of a male with IQ of 120 and a GPA of 4.0 ________________

Purchase answer to see full
attachment