Description

Can you please only do part2?

Unformatted Attachment Preview

CORRELATION & LINEAR REGRESSION
LECTURE 3
How to estimate relationships?
DEP. VAR.
Interval/Ratio
Correlation, Regression models
Ordinal (Two Levels)
Logistic regression models
Ordinal (Multiple Levels)
Conditional & Multinomial logit models
SPECIAL CASE
Conjoint analysis
LEARNING
OBJECTIVES
Correlation
analysis
Multivariate
Simple
Regression
Models
Regression
Models
Shine Sonic Toothbrush Co.
 Annual sales for 40 territories (in ‘000s units)
 Advertising (TV Spots / Month)
 Number of salespeople
 Wholesaler efficiency index
 See “ShineSonic.xlsm”
Sonic Shine: Descriptive statistics
Statistic
Sales (in 1,000’s)
ADV
Sales People
Wholesaler Index
40
40
40
40
Minimum
220.500
4.000
3.000
1.000
Maximum
667.000
19.000
8.000
4.000
1st Quartile
315.225
7.750
3.750
2.000
Median
398.400
10.000
5.000
3.000
3rd Quartile
507.575
13.250
6.000
4.000
Mean
411.288
10.900
5.000
2.825
15339.821
18.554
2.718
0.969
123.854
4.307
1.649
0.984
Skewness (Pearson)
0.426
0.373
0.348
-0.298
Kurtosis (Pearson)
-0.895
-0.940
-0.999
-0.972
Nbr. of observations
Variance (n-1)
Standard deviation (n-1)
Sonic Shine: ADV & SALES
 We want to understand the
relationship b/w ADV & sales.
 Correlation analysis
 Simple linear regression model
CORRELATIONS …
 rXY → correlation between X and Y
 Represent the relationship among variables: How the value of
one variable varies in relation to variation in another variable.
 BUT correlations don’t imply causation.
CORRELATIONS …

Measure strength of the relationship
between two variables.

Linear Correlation Coefficient (r):
Measures the linear relationship
between two interval or ratio scaled
variables (X and Y).

Pearson Product-Moment Coefficient
CORRELATIONS …
 Range between -1.00 and +1.00
 Positive correlation: Variables changing in the same direction
 Negative correlation: Variables changing in opposite directions
 The absolute value of the coefficient indicates strength.
(-.70 implies a stronger association than +.25.)
RELATIONSHIP STRENGHT
CORRELATION
INTERPRETATION
.8 to 1.0
Very strong relationship
.6 to .8
Strong relationship
.4 to .6
Moderate relationship
.2 to .4
Weak relationship
0 to .2
Very weak or no relationship
CORRELATIONS …
 Do not run correlations on nominal variables.
 Use the Pearson correlation for interval and ratio variables.
 Use the Spearman correlation if at least one or both variables are
ordinal.
CORRELATION: Shine Sonic Toothbrush
Correlation matrix (Pearson):
Variables
Sales (in 1000’s)
Advertising (TV Spots/ Month)
Sales (in 1000’s)
ADV
1
0.880
0.880
1
Values in bold are different from 0 with a significance level alpha=0.05
p-values (Pearson):
Variables
Sales (in 1000’s)
Advertising (TV Spots/ Month)
Sales (in 1000’s)
ADV
0
< 0.0001 < 0.0001 0 REGRESSION ANALYSIS  Regression analysis is a technique for quantifying a relationship among one or more independent (predictor) variables and a dependent variable.  We typically use it to: • Predict the DV based on values of the IVs. • Understand how the IVs influence the DV. SIMPLE REGRESSION MODEL  We want to quantify the relationship between sales (dependent variable) and advertising (independent variable).  We make an assumption of a linear association.  Sales = f (ADV) SIMPLE REGRESSION MODEL We assume: Linear ADV & Sales relationship. Random error: Normal distribution St = α1 + β1At + εt Sales (territory t) ADV (territory t) We want to find the line that best matches the data. Linear regression analysis finds that line, which is defined by α1 & β1. SIMPLE REGRESSION MODEL Test of: H0: α= β =0 Ha: At least one ≠ 0 Test of: H0: α =0 Ha: α ≠0 ̂ ̂ Test of: H0: β =0 Ha: β ≠0 How do we find the best line?  What if we want to predict college GPA based on high school GPA?  The regression line reflects our best estimate as to what score on the Y variable would be predicted by the X variable.  Also known as the “line of best fit.” PREDICTION ERROR SIMPLE REGRESSION MODEL n  We partition the SST (total sum of squares) into SSR (sum of squares for the regression model) & SSE (sum of squares for error). SST =  (Y j − Y ) 2 j =1 n ( ) ( ) SS E =  Y j − Yˆj j =1 2  We select the line defined by α and β that minimizes the sum of the squared errors. n SS R =  Yˆj − Y j =1 2 REGRESSION ANALYSIS / = >
/( − − 1)
 We compare the ratio between the MSR and the MSE.
 If this ratio exceeds a critical F value, we can conclude that our IV can be
used to predict (to some extent) the DV.
How do we determine the critical F?
 The critical F value comes from an F (DFN, DFD) distribution, where
 Numerator Degrees of Freedom (DFN)
= Number of independent variables
 Denominator Degrees of Freedom (DFD)
= Number of observations minus Number of independent variables – 1
REGRESSION: DON’T FORGET!
 The independent and dependent variables (IV & DV) are evaluated on
interval or ratio scales.
 The assumptions underlying regression models are satisfied. Please find
a detailed discussion of the assumptions in the Tests for Association
homework template.
MULTIPLE REGRESSION ANALYSIS
Y j = a + b1 X 1j + b2 X 2j + … + bk X kj + e j
Observations: j = 1  n
Predictors: i = 1  k
MULTIPLE REGRESSION ANALYSIS
• Quantify the impact of various simultaneous influences upon one DV.
• Often essential, because omitted variables might bias our estimates, even
when we only have interest in the effect of one independent variable.
• βj measures the amount by which the dependent variable Y changes
when the independent variable X j changes by one unit, all other
independent variables are kept constant.
MULTIPLE REGRESSION: Sonic Shine
St = α + β1At + β2SPt + β3Wt + εt
St = Sales (territory t)
At = Number of TV spots (territory t)
SPt = Number of salespeople (territory t)
Wt = Wholesaler efficiency index (territory t::
4 = Excellent, 3 = Good, 2 = Average, 1 = Poor
REGRESSION: VARIABLE SELECTION
We include a variable when:
 It’s a decision variable (ADV, salespeople).
 It controls for important factors outside management’s control.
 The model is parsimonious.
MULTICOLLINEARITY DIAGNOSTICS
 Tolerance (≥ .10)
 Variance Inflation Factor (VIF) (< 5)  Bivariate correlation (r ≤ .70)  Otherwise, combine or remove variables! MULTICOLLINEARITY: Sonic Shine Correlation matrix: ADV Sales People Efficiency Sales (in 1000's) 1 0.776 0.032 0.880 Sales People 0.776 1 -0.190 0.882 Efficiency 0.032 -0.190 1 0.002 Sales (in 1000's) 0.880 0.882 0.002 1 ADV Sales People Efficiency Tolerance 0.364 0.351 0.883 VIF 2.747 2.847 1.132 ADV Multicolinearity statistics: SIGNIFICANCE TESTS: Sonic Shine Regression of variable Sales (in 1000's): Goodness of fit statistics (Sales (in 1000's)): Observations DF R² Adjusted R² Adjusted R² 40.000 36.000 0.881 0.871 0.871  What is the probability that at least one of the estimated coefficients is different from zero?  H0: β1 = β2 = β3 = 0  H1: At least one β is ≠ 0  Does an independent variable have a “significant” effect on the dependent variable? OR What is the probability that an estimated coefficient is ≠ 0 ? Analysis of variance (Sales (in 1000's)): Source Model Error Corrected Total Computed against model Y=Mean(Y) DF 3 36 39 Sum of squares 527209.081 71043.943 598253.024 Mean squares 175736.360 1973.443 F 89.051 Pr > F
< 0.0001 Model parameters (Sales (in 1000's)): Source Intercept ADV Sales People Efficiency Value 31.150 12.968 41.246 11.524 Standard error 34.175 2.737 7.280 7.691 t 0.911 4.738 5.666 1.498 Pr > |t| Lower bound (95%) Upper bound (95%)
100.461
-38.160
0.368
18.520
7.417
< 0.0001 56.010 26.481 < 0.0001 27.123 -4.074 0.143 Equation of the model (Sales (in 1000's)): Sales (in 1000's) = 31.150+12.968*Number of TV Spots + 41.246*Number of Sales People + 11.524*Wholesaler Efficiency Index Standardized coefficients (Sales (in 1000's)): Source ADV Sales People Efficiency Value 0.451 0.549 0.092 Standard error 0.095 0.097 0.061 t 4.738 5.666 1.498 Pr > |t| Lower bound (95%) Upper bound (95%)
0.644
0.258
< 0.0001 0.746 0.352 < 0.0001 0.216 -0.032 0.143 GOODNESS-OF-FIT STATISTICS  The R-squared statistic (also known as the coefficient of determination) measures the percentage of variation in the dependent variable explained by all independent variables included in the model.  The value of this statistic increases with the addition of new independent variables to the model, even if the added variables have no significant effect. GOODNESS-OF-FIT STATISTICS (Cont’d)  The adjusted R-squared statistic adjusts for the number of independent variables relative to sample size; we use it for comparing competing models.  Its value decreases if a newly added predictor is not significant; its value goes up only if a newly added independent variable has a significant effect. How to interpret the coefficients?   One extra TV spot / month leads to a 12,968 units increase in sales, keeping everything else constant. One more salesperson leads to a 41,246 units increase in sales, keeping everything else constant. Model parameters (Sales (in 1000's)): Source Intercept ADV Sales People Efficiency Value 31.150 12.968 41.246 11.524 How to compare coefficients?  We consults standardized coefficients  The number of salespeople has the largest impact on sales. Standardized coefficients (Sales (in 1000's)): Source ADV Sales People Efficiency Value 0.451 0.549 0.092 TV (Q3) R (Q3) TV (Q8-R) 300 25 300 Overall Revenue Effect (mf = 3), Per store 0.00 0.00 0.00 Overall Profit (Margin = 30%), Per store 0.00 0.00 0.00 Total extra profit (For all 100 stores) 0.00 0.00 0.00 -300.00 -25.00 -300.00 Cost (Per GRP) Increase in Sales (Per GRP), Per store Net Effect, Overall Q2 PREDICTION Minimum Mean Maximum Q3 TV GRPs Radio GRPs TV=40, R=80 40 80 Reference P. 6 From model P. 4, Par. 4 P. 2, Par. 1 P. 5, Par. 5 Week Sales (euros) TV Radio Fuel Volume 26 24,864 74.5 66.5 61,825 27 23,809 74.5 66.5 62,617 28 24,476 90 75 60,227 29 25,279 90 75 63,273 30 26,263 90 75 65,196 31 24,299 90 75 64,789 32 25,671 15 8.5 65,901 33 24,489 0 0 65,474 34 24,416 0 0 65,706 35 23,555 0 0 61,824 36 22,377 0 0 61,810 37 18,969 37.5 68.5 59,697 38 22,924 37.5 68.5 64,729 39 20,449 37.5 68.5 64,570 40 21,171 37.5 68.5 64,005 41 19,729 0 0 64,163 42 20,244 0 0 60,563 43 21,246 0 0 63,694 44 19,603 0 0 60,719 45 22,432 0 0 63,290 46 19,522 37.5 68.5 59,567 47 21,426 37.5 68.5 61,359 48 19,982 0 0 63,303 49 23,529 37.5 68.5 62,780 50 19,067 0 0 61,374 51 19,787 37.5 68.5 58,166 52 19,798 0 0 59,797 1 20,199 0 0 57,557 2 19,859 0 0 59,428 3 19,136 0 0 62,544 4 21,582 0 0 63,209 5 22,383 0 0 64,085 6 20,632 0 0 60,173 7 20,431 0 0 64,855 8 21,460 0 0 60,232 9 22,800 0 0 61,780 10 20,701 0 0 63,111 11 20,151 0 0 62,750 12 22,896 0 0 64,700 13 20,995 225 205 64,302 14 23,216 220 205 64,829 15 22,716 90 240 64,289 16 21,320 0 0 62,447 17 24,332 0 0 63,716 18 25,320 225 205 64,721 19 25,833 0 125 66,107 20 27,415 225 205 66,548 21 24,861 0 260 62,541 22 22,824 0 260 63,578 23 22,824 0 250 63,623 24 24,829 220 205 63,225 25 25,628 90 240 66,714 26 27,039 0 125 67,005 27 27,908 220 205 63,642 28 27,872 95 240 65,753 29 27,279 0 135 65,423 30 27,285 145 130 65,712 31 28,451 225 205 65,262 32 27,195 95 195 64,165 33 27,137 70 240 64,637 34 26,176 0 130 65,000 35 24,950 0 50 68,549 36 22,387 0 260 61,754 37 22,187 0 260 63,510 38 23,048 21 250 65,064 39 23,210 10 0 62,857 40 20,102 0 0 59,950 41 19,975 0 260 61,582 42 21,364 0 250 61,756 43 22,354 0 260 63,033 44 20,098 5 0 60,116 45 23,497 0 0 58,929 46 21,910 0 0 57,984 47 21,323 4 0 56,259 48 19,648 0 0 59,248 49 23,403 0 0 61,171 50 23,238 0 0 62,731 51 20,356 0 0 58,129 52 22,242 0 0 56,720 1 21,854 0 0 63,368 2 20,213 0 0 62,142 3 20,392 0 0 61,986 4 23,403 0 0 61,741 5 23,188 0 0 62,706 6 22,629 0 0 62,368 7 21,741 0 0 63,136 8 24,207 0 0 62,903 9 25,778 0 0 65,109 10 23,919 0 0 61,967 11 22,182 0 0 61,591 12 23,291 0 0 63,247 13 24,011 180 237 62,519 14 23,979 160 208 63,128 15 23,896 150 208 64,331 16 26,873 0 0 65,812 17 27,535 180 237 64,723 18 23,688 160 208 65,052 19 23,922 150 208 63,214 20 24,421 0 0 63,379 21 26,197 0 0 63,623 22 26,765 180 237 62,661 Fuel Price Temperature Precipitation (mm) Visits (1 or 2) Holiday 104.24 27.9 0.9 7 1 103.97 27.7 1.3 7 1 107.48 29.1 4.8 5.9 1 111.75 30.0 3.1 5.9 1 109.08 29.3 0.0 5.9 1 105.36 28.1 3.6 5.9 1 107.31 26.9 4.8 3.2 1 107.99 28.9 5.6 3.2 1 109.30 27.0 8.5 3.2 1 107.59 24.2 5.7 3.2 1 105.41 24.3 10.6 3.2 1 107.35 23.1 19.2 4.3 0 106.17 21.0 18.3 4.3 0 107.63 20.3 25.8 4.3 0 108.31 20.9 26.8 4.3 0 109.49 19.1 30.3 7.5 0 109.60 19.6 21.5 7.5 0 111.12 18.9 20.2 7.5 1 108.36 14.2 24.6 7.5 1 108.33 13.1 19.2 6.7 1 106.54 13.2 12.1 6.7 0 104.57 12.9 11.3 6.7 0 105.61 11.1 11.2 10.3 0 107.66 10.8 10.3 6.7 0 103.56 10.3 11.6 10.3 0 101.72 9.2 9.0 6.7 1 101.46 9.1 14.3 10.3 1 104.07 9.9 12.9 7.6 1 102.23 9.6 15.3 7.6 0 103.60 11.9 14.4 7.6 0 105.37 10.4 5.1 7.6 0 109.20 11.5 12.3 5.4 0 107.42 11.8 11.5 5.4 0 106.92 10.4 7.3 6.4 0 106.83 12.0 7.4 5.4 0 108.62 12.5 10.1 5.4 1 107.10 11.9 12.3 6.4 1 110.74 12.6 16.0 6.4 0 112.11 14.8 8.4 6.4 0 112.71 14.3 10.7 12.5 0 112.71 15.9 11.3 12.5 0 116.40 16.5 14.9 12.5 1 113.21 17.7 9.4 12.5 1 114.66 16.0 5.0 6 1 114.66 16.3 12.9 6 1 114.97 18.5 13.4 5 1 112.90 21.6 8.5 3 1 111.70 21.2 6.5 6 0 114.10 20.4 7.5 6 0 114.38 21.8 6.6 6 0 112.50 22.6 4.3 5 0 115.99 25.9 3.2 5 0 118.97 25.0 0.0 5 0 116.61 26.1 1.6 3 1 120.22 28.0 3.9 3 1 119.38 29.2 3.5 3 1 119.21 30.7 0.0 0 1 118.00 29.4 3.2 0 1 119.55 29.3 4.3 0 1 123.13 26.2 5.2 0 1 122.81 27.7 10.1 0 1 120.42 27.2 5.3 9 1 133.67 25.1 13.4 9 0 130.72 23.1 20.9 9 0 125.77 23.5 17.1 9 0 126.51 21.4 25.7 7 0 126.30 20.7 25.7 7 0 128.11 19.4 30.0 7 0 124.85 19.2 20.2 7 0 119.66 19.8 20.8 7 1 119.83 18.0 24.4 4 1 118.47 14.1 19.1 4 1 116.09 14.1 11.1 4 0 115.90 14.0 11.7 4 0 116.46 12.9 10.5 7 0 115.50 12.4 10.6 7 0 116.19 11.1 12.4 7 0 119.30 10.1 10.9 2 0 121.07 9.9 15.4 2 1 120.18 9.8 12.4 7 1 122.13 10.3 15.5 2 0 124.31 10.6 14.5 2 0 120.94 10.0 6.9 2 0 121.03 11.7 11.3 2 0 120.33 11.2 11.3 2 0 119.38 10.9 6.2 2 0 118.78 10.5 7.2 2 1 119.85 12.6 9.9 3 1 120.87 11.1 13.3 3 1 122.95 12.8 15.4 3 0 123.51 12.6 7.5 3 0 125.02 13.5 7.3 9 0 125.30 14.9 12.1 9 0 129.63 15.7 14.4 9 1 131.02 15.0 9.0 9 1 131.33 17.3 5.8 6 1 130.95 16.8 12.7 6 1 129.06 17.3 14.6 6 1 129.28 19.7 5.9 6 0 127.19 21.1 6.1 6 0 129.19 21.8 7.3 4 0 TV x H RxH TxH 74.5 66.5 27.9 74.5 66.5 27.7 90 75 29.1 90 75 30 90 75 29.3 90 75 28.1 15 8.5 26.9 0 0 28.9 0 0 27 0 0 24.2 0 0 24.3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 18.9 0 0 14.2 0 0 13.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 37.5 68.5 9.2 0 0 9.1 0 0 9.9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 12.5 0 0 11.9 0 0 0 0 0 0 0 0 0 0 0 0 90 240 16.5 0 0 17.7 0 0 16 225 205 16.3 0 125 18.5 225 205 21.6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 220 205 26.1 95 240 28 0 135 29.2 145 130 30.7 225 205 29.4 95 195 29.3 70 240 26.2 0 130 27.7 0 50 27.2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 260 19.8 5 0 18 0 0 14.1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 9.9 0 0 9.8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10.5 0 0 12.6 0 0 11.1 0 0 0 0 0 0 0 0 0 0 0 0 150 208 15.7 0 0 15 180 237 17.3 160 208 16.8 150 208 17.3 0 0 0 0 0 0 0 0 0 Variable Name Week Sales TV Radio Fuel Volume Fuel Price Temperature Precipitation (mm) Visits (1 or 2) Holiday What this variable means? Week number (of the year) Convenience store sales (in euros), per store, on average across all 100 stores in the Marseille area, for that week Number of TV GRPs that week (1 GRP = 1% of target market saw/heard the ad once) Number of Radio GRPs that week (1 GRP = 1% of target market saw/heard the ad once) Fuel volume sales (in liters), per station, on average across all 100 stations in the Marseille area, for that week (all f Average price of fuel (in cents) in the Marseille area, for that week (across all fuel types) Average high temperature (in C) recorded in the Marseille area, for that week. Total precipitation (in mm) in the Marseille area, for that week The percentage of survey respondents reporting 1 to 2 visits to a EurePet store in previous week (vs. "0 times" or " This is a "dummy variable;" Holiday = 1 if there was a national or school holiday that week; Holiday = 0 if there was area, for that week ea, for that week (all fuel types) eek (vs. "0 times" or "3 or more times"). oliday = 0 if there was no holiday that week Purchase answer to see full attachment