Description
Can you please only do part2?
Unformatted Attachment Preview
CORRELATION & LINEAR REGRESSION
LECTURE 3
How to estimate relationships?
DEP. VAR.
Interval/Ratio
Correlation, Regression models
Ordinal (Two Levels)
Logistic regression models
Ordinal (Multiple Levels)
Conditional & Multinomial logit models
SPECIAL CASE
Conjoint analysis
LEARNING
OBJECTIVES
Correlation
analysis
Multivariate
Simple
Regression
Models
Regression
Models
Shine Sonic Toothbrush Co.
Annual sales for 40 territories (in ‘000s units)
Advertising (TV Spots / Month)
Number of salespeople
Wholesaler efficiency index
See “ShineSonic.xlsm”
Sonic Shine: Descriptive statistics
Statistic
Sales (in 1,000’s)
ADV
Sales People
Wholesaler Index
40
40
40
40
Minimum
220.500
4.000
3.000
1.000
Maximum
667.000
19.000
8.000
4.000
1st Quartile
315.225
7.750
3.750
2.000
Median
398.400
10.000
5.000
3.000
3rd Quartile
507.575
13.250
6.000
4.000
Mean
411.288
10.900
5.000
2.825
15339.821
18.554
2.718
0.969
123.854
4.307
1.649
0.984
Skewness (Pearson)
0.426
0.373
0.348
-0.298
Kurtosis (Pearson)
-0.895
-0.940
-0.999
-0.972
Nbr. of observations
Variance (n-1)
Standard deviation (n-1)
Sonic Shine: ADV & SALES
We want to understand the
relationship b/w ADV & sales.
Correlation analysis
Simple linear regression model
CORRELATIONS …
rXY → correlation between X and Y
Represent the relationship among variables: How the value of
one variable varies in relation to variation in another variable.
BUT correlations don’t imply causation.
CORRELATIONS …
•
Measure strength of the relationship
between two variables.
•
Linear Correlation Coefficient (r):
Measures the linear relationship
between two interval or ratio scaled
variables (X and Y).
•
Pearson Product-Moment Coefficient
CORRELATIONS …
Range between -1.00 and +1.00
Positive correlation: Variables changing in the same direction
Negative correlation: Variables changing in opposite directions
The absolute value of the coefficient indicates strength.
(-.70 implies a stronger association than +.25.)
RELATIONSHIP STRENGHT
CORRELATION
INTERPRETATION
.8 to 1.0
Very strong relationship
.6 to .8
Strong relationship
.4 to .6
Moderate relationship
.2 to .4
Weak relationship
0 to .2
Very weak or no relationship
CORRELATIONS …
Do not run correlations on nominal variables.
Use the Pearson correlation for interval and ratio variables.
Use the Spearman correlation if at least one or both variables are
ordinal.
CORRELATION: Shine Sonic Toothbrush
Correlation matrix (Pearson):
Variables
Sales (in 1000’s)
Advertising (TV Spots/ Month)
Sales (in 1000’s)
ADV
1
0.880
0.880
1
Values in bold are different from 0 with a significance level alpha=0.05
p-values (Pearson):
Variables
Sales (in 1000’s)
Advertising (TV Spots/ Month)
Sales (in 1000’s)
ADV
0
< 0.0001
< 0.0001
0
REGRESSION ANALYSIS
Regression analysis is a technique for quantifying a relationship among
one or more independent (predictor) variables and a dependent variable.
We typically use it to:
• Predict the DV based on values of the IVs.
• Understand how the IVs influence the DV.
SIMPLE REGRESSION MODEL
We want to quantify the
relationship between sales
(dependent variable) and
advertising (independent
variable).
We make an assumption of
a linear association.
Sales = f (ADV)
SIMPLE REGRESSION MODEL
We assume: Linear ADV & Sales relationship.
Random error: Normal distribution
St = α1 + β1At + εt
Sales (territory t)
ADV (territory t)
We want to find the line that best matches
the data. Linear regression analysis finds
that line, which is defined by α1 & β1.
SIMPLE REGRESSION MODEL
Test of:
H0: α= β =0
Ha: At least one ≠ 0
Test of:
H0: α =0
Ha: α ≠0
̂
̂
Test of:
H0: β =0
Ha: β ≠0
How do we find the best line?
What if we want to predict college
GPA based on high school GPA?
The regression line reflects our
best estimate as to what score on
the Y variable would be predicted
by the X variable.
Also known as the “line of best
fit.”
PREDICTION ERROR
SIMPLE REGRESSION MODEL
n
We partition the SST (total sum of
squares) into SSR (sum of squares
for the regression model) & SSE
(sum of squares for error).
SST = (Y j − Y ) 2
j =1
n
(
)
(
)
SS E = Y j − Yˆj
j =1
2
We select the line defined by α
and β that minimizes the sum of
the squared errors.
n
SS R = Yˆj − Y
j =1
2
REGRESSION ANALYSIS
/
=
>
/( − − 1)
We compare the ratio between the MSR and the MSE.
If this ratio exceeds a critical F value, we can conclude that our IV can be
used to predict (to some extent) the DV.
How do we determine the critical F?
The critical F value comes from an F (DFN, DFD) distribution, where
Numerator Degrees of Freedom (DFN)
= Number of independent variables
Denominator Degrees of Freedom (DFD)
= Number of observations minus Number of independent variables – 1
REGRESSION: DON’T FORGET!
The independent and dependent variables (IV & DV) are evaluated on
interval or ratio scales.
The assumptions underlying regression models are satisfied. Please find
a detailed discussion of the assumptions in the Tests for Association
homework template.
MULTIPLE REGRESSION ANALYSIS
Y j = a + b1 X 1j + b2 X 2j + … + bk X kj + e j
Observations: j = 1 n
Predictors: i = 1 k
MULTIPLE REGRESSION ANALYSIS
• Quantify the impact of various simultaneous influences upon one DV.
• Often essential, because omitted variables might bias our estimates, even
when we only have interest in the effect of one independent variable.
• βj measures the amount by which the dependent variable Y changes
when the independent variable X j changes by one unit, all other
independent variables are kept constant.
MULTIPLE REGRESSION: Sonic Shine
St = α + β1At + β2SPt + β3Wt + εt
St = Sales (territory t)
At = Number of TV spots (territory t)
SPt = Number of salespeople (territory t)
Wt = Wholesaler efficiency index (territory t::
4 = Excellent, 3 = Good, 2 = Average, 1 = Poor
REGRESSION: VARIABLE SELECTION
We include a variable when:
It’s a decision variable (ADV, salespeople).
It controls for important factors outside management’s control.
The model is parsimonious.
MULTICOLLINEARITY DIAGNOSTICS
Tolerance (≥ .10)
Variance Inflation Factor (VIF) (< 5)
Bivariate correlation (r ≤ .70)
Otherwise, combine or remove variables!
MULTICOLLINEARITY: Sonic Shine
Correlation matrix:
ADV
Sales People
Efficiency
Sales (in 1000's)
1
0.776
0.032
0.880
Sales People
0.776
1
-0.190
0.882
Efficiency
0.032
-0.190
1
0.002
Sales (in 1000's)
0.880
0.882
0.002
1
ADV
Sales People
Efficiency
Tolerance
0.364
0.351
0.883
VIF
2.747
2.847
1.132
ADV
Multicolinearity statistics:
SIGNIFICANCE TESTS: Sonic Shine
Regression of variable Sales (in 1000's):
Goodness of fit statistics (Sales (in 1000's)):
Observations
DF
R²
Adjusted R²
Adjusted R²
40.000
36.000
0.881
0.871
0.871
What is the probability that at least one
of the estimated coefficients is different
from zero?
H0: β1 = β2 = β3 = 0
H1: At least one β is ≠ 0
Does an independent variable have a
“significant” effect on the dependent
variable? OR What is the probability that
an estimated coefficient is ≠ 0 ?
Analysis of variance (Sales (in 1000's)):
Source
Model
Error
Corrected Total
Computed against model Y=Mean(Y)
DF
3
36
39
Sum of squares
527209.081
71043.943
598253.024
Mean squares
175736.360
1973.443
F
89.051
Pr > F
< 0.0001
Model parameters (Sales (in 1000's)):
Source
Intercept
ADV
Sales People
Efficiency
Value
31.150
12.968
41.246
11.524
Standard error
34.175
2.737
7.280
7.691
t
0.911
4.738
5.666
1.498
Pr > |t| Lower bound (95%) Upper bound (95%)
100.461
-38.160
0.368
18.520
7.417
< 0.0001
56.010
26.481
< 0.0001
27.123
-4.074
0.143
Equation of the model (Sales (in 1000's)):
Sales (in 1000's) = 31.150+12.968*Number of TV Spots + 41.246*Number of Sales People + 11.524*Wholesaler Efficiency Index
Standardized coefficients (Sales (in 1000's)):
Source
ADV
Sales People
Efficiency
Value
0.451
0.549
0.092
Standard error
0.095
0.097
0.061
t
4.738
5.666
1.498
Pr > |t| Lower bound (95%) Upper bound (95%)
0.644
0.258
< 0.0001
0.746
0.352
< 0.0001
0.216
-0.032
0.143
GOODNESS-OF-FIT STATISTICS
The R-squared statistic (also known as the coefficient of determination)
measures the percentage of variation in the dependent variable
explained by all independent variables included in the model.
The value of this statistic increases with the addition of new independent
variables to the model, even if the added variables have no significant
effect.
GOODNESS-OF-FIT STATISTICS (Cont’d)
The adjusted R-squared statistic adjusts for the number of independent
variables relative to sample size; we use it for comparing competing
models.
Its value decreases if a newly added predictor is not significant; its value
goes up only if a newly added independent variable has a significant
effect.
How to interpret the coefficients?
One extra TV spot / month leads to
a 12,968 units increase in sales,
keeping everything else constant.
One more salesperson leads to a
41,246 units increase in sales,
keeping everything else constant.
Model parameters (Sales (in 1000's)):
Source
Intercept
ADV
Sales People
Efficiency
Value
31.150
12.968
41.246
11.524
How to compare coefficients?
We consults standardized
coefficients
The number of salespeople
has the largest impact on
sales.
Standardized coefficients (Sales (in 1000's)):
Source
ADV
Sales People
Efficiency
Value
0.451
0.549
0.092
TV (Q3)
R (Q3)
TV (Q8-R)
300
25
300
Overall Revenue Effect (mf = 3), Per store
0.00
0.00
0.00
Overall Profit (Margin = 30%), Per store
0.00
0.00
0.00
Total extra profit (For all 100 stores)
0.00
0.00
0.00
-300.00
-25.00
-300.00
Cost (Per GRP)
Increase in Sales (Per GRP), Per store
Net Effect, Overall
Q2
PREDICTION
Minimum
Mean
Maximum
Q3
TV GRPs
Radio GRPs
TV=40, R=80
40
80
Reference
P. 6
From model
P. 4, Par. 4
P. 2, Par. 1
P. 5, Par. 5
Week
Sales (euros)
TV
Radio
Fuel Volume
26
24,864
74.5
66.5
61,825
27
23,809
74.5
66.5
62,617
28
24,476
90
75
60,227
29
25,279
90
75
63,273
30
26,263
90
75
65,196
31
24,299
90
75
64,789
32
25,671
15
8.5
65,901
33
24,489
0
0
65,474
34
24,416
0
0
65,706
35
23,555
0
0
61,824
36
22,377
0
0
61,810
37
18,969
37.5
68.5
59,697
38
22,924
37.5
68.5
64,729
39
20,449
37.5
68.5
64,570
40
21,171
37.5
68.5
64,005
41
19,729
0
0
64,163
42
20,244
0
0
60,563
43
21,246
0
0
63,694
44
19,603
0
0
60,719
45
22,432
0
0
63,290
46
19,522
37.5
68.5
59,567
47
21,426
37.5
68.5
61,359
48
19,982
0
0
63,303
49
23,529
37.5
68.5
62,780
50
19,067
0
0
61,374
51
19,787
37.5
68.5
58,166
52
19,798
0
0
59,797
1
20,199
0
0
57,557
2
19,859
0
0
59,428
3
19,136
0
0
62,544
4
21,582
0
0
63,209
5
22,383
0
0
64,085
6
20,632
0
0
60,173
7
20,431
0
0
64,855
8
21,460
0
0
60,232
9
22,800
0
0
61,780
10
20,701
0
0
63,111
11
20,151
0
0
62,750
12
22,896
0
0
64,700
13
20,995
225
205
64,302
14
23,216
220
205
64,829
15
22,716
90
240
64,289
16
21,320
0
0
62,447
17
24,332
0
0
63,716
18
25,320
225
205
64,721
19
25,833
0
125
66,107
20
27,415
225
205
66,548
21
24,861
0
260
62,541
22
22,824
0
260
63,578
23
22,824
0
250
63,623
24
24,829
220
205
63,225
25
25,628
90
240
66,714
26
27,039
0
125
67,005
27
27,908
220
205
63,642
28
27,872
95
240
65,753
29
27,279
0
135
65,423
30
27,285
145
130
65,712
31
28,451
225
205
65,262
32
27,195
95
195
64,165
33
27,137
70
240
64,637
34
26,176
0
130
65,000
35
24,950
0
50
68,549
36
22,387
0
260
61,754
37
22,187
0
260
63,510
38
23,048
21
250
65,064
39
23,210
10
0
62,857
40
20,102
0
0
59,950
41
19,975
0
260
61,582
42
21,364
0
250
61,756
43
22,354
0
260
63,033
44
20,098
5
0
60,116
45
23,497
0
0
58,929
46
21,910
0
0
57,984
47
21,323
4
0
56,259
48
19,648
0
0
59,248
49
23,403
0
0
61,171
50
23,238
0
0
62,731
51
20,356
0
0
58,129
52
22,242
0
0
56,720
1
21,854
0
0
63,368
2
20,213
0
0
62,142
3
20,392
0
0
61,986
4
23,403
0
0
61,741
5
23,188
0
0
62,706
6
22,629
0
0
62,368
7
21,741
0
0
63,136
8
24,207
0
0
62,903
9
25,778
0
0
65,109
10
23,919
0
0
61,967
11
22,182
0
0
61,591
12
23,291
0
0
63,247
13
24,011
180
237
62,519
14
23,979
160
208
63,128
15
23,896
150
208
64,331
16
26,873
0
0
65,812
17
27,535
180
237
64,723
18
23,688
160
208
65,052
19
23,922
150
208
63,214
20
24,421
0
0
63,379
21
26,197
0
0
63,623
22
26,765
180
237
62,661
Fuel Price
Temperature
Precipitation (mm)
Visits (1 or 2)
Holiday
104.24
27.9
0.9
7
1
103.97
27.7
1.3
7
1
107.48
29.1
4.8
5.9
1
111.75
30.0
3.1
5.9
1
109.08
29.3
0.0
5.9
1
105.36
28.1
3.6
5.9
1
107.31
26.9
4.8
3.2
1
107.99
28.9
5.6
3.2
1
109.30
27.0
8.5
3.2
1
107.59
24.2
5.7
3.2
1
105.41
24.3
10.6
3.2
1
107.35
23.1
19.2
4.3
0
106.17
21.0
18.3
4.3
0
107.63
20.3
25.8
4.3
0
108.31
20.9
26.8
4.3
0
109.49
19.1
30.3
7.5
0
109.60
19.6
21.5
7.5
0
111.12
18.9
20.2
7.5
1
108.36
14.2
24.6
7.5
1
108.33
13.1
19.2
6.7
1
106.54
13.2
12.1
6.7
0
104.57
12.9
11.3
6.7
0
105.61
11.1
11.2
10.3
0
107.66
10.8
10.3
6.7
0
103.56
10.3
11.6
10.3
0
101.72
9.2
9.0
6.7
1
101.46
9.1
14.3
10.3
1
104.07
9.9
12.9
7.6
1
102.23
9.6
15.3
7.6
0
103.60
11.9
14.4
7.6
0
105.37
10.4
5.1
7.6
0
109.20
11.5
12.3
5.4
0
107.42
11.8
11.5
5.4
0
106.92
10.4
7.3
6.4
0
106.83
12.0
7.4
5.4
0
108.62
12.5
10.1
5.4
1
107.10
11.9
12.3
6.4
1
110.74
12.6
16.0
6.4
0
112.11
14.8
8.4
6.4
0
112.71
14.3
10.7
12.5
0
112.71
15.9
11.3
12.5
0
116.40
16.5
14.9
12.5
1
113.21
17.7
9.4
12.5
1
114.66
16.0
5.0
6
1
114.66
16.3
12.9
6
1
114.97
18.5
13.4
5
1
112.90
21.6
8.5
3
1
111.70
21.2
6.5
6
0
114.10
20.4
7.5
6
0
114.38
21.8
6.6
6
0
112.50
22.6
4.3
5
0
115.99
25.9
3.2
5
0
118.97
25.0
0.0
5
0
116.61
26.1
1.6
3
1
120.22
28.0
3.9
3
1
119.38
29.2
3.5
3
1
119.21
30.7
0.0
0
1
118.00
29.4
3.2
0
1
119.55
29.3
4.3
0
1
123.13
26.2
5.2
0
1
122.81
27.7
10.1
0
1
120.42
27.2
5.3
9
1
133.67
25.1
13.4
9
0
130.72
23.1
20.9
9
0
125.77
23.5
17.1
9
0
126.51
21.4
25.7
7
0
126.30
20.7
25.7
7
0
128.11
19.4
30.0
7
0
124.85
19.2
20.2
7
0
119.66
19.8
20.8
7
1
119.83
18.0
24.4
4
1
118.47
14.1
19.1
4
1
116.09
14.1
11.1
4
0
115.90
14.0
11.7
4
0
116.46
12.9
10.5
7
0
115.50
12.4
10.6
7
0
116.19
11.1
12.4
7
0
119.30
10.1
10.9
2
0
121.07
9.9
15.4
2
1
120.18
9.8
12.4
7
1
122.13
10.3
15.5
2
0
124.31
10.6
14.5
2
0
120.94
10.0
6.9
2
0
121.03
11.7
11.3
2
0
120.33
11.2
11.3
2
0
119.38
10.9
6.2
2
0
118.78
10.5
7.2
2
1
119.85
12.6
9.9
3
1
120.87
11.1
13.3
3
1
122.95
12.8
15.4
3
0
123.51
12.6
7.5
3
0
125.02
13.5
7.3
9
0
125.30
14.9
12.1
9
0
129.63
15.7
14.4
9
1
131.02
15.0
9.0
9
1
131.33
17.3
5.8
6
1
130.95
16.8
12.7
6
1
129.06
17.3
14.6
6
1
129.28
19.7
5.9
6
0
127.19
21.1
6.1
6
0
129.19
21.8
7.3
4
0
TV x H
RxH
TxH
74.5
66.5
27.9
74.5
66.5
27.7
90
75
29.1
90
75
30
90
75
29.3
90
75
28.1
15
8.5
26.9
0
0
28.9
0
0
27
0
0
24.2
0
0
24.3
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
18.9
0
0
14.2
0
0
13.1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
37.5
68.5
9.2
0
0
9.1
0
0
9.9
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
12.5
0
0
11.9
0
0
0
0
0
0
0
0
0
0
0
0
90
240
16.5
0
0
17.7
0
0
16
225
205
16.3
0
125
18.5
225
205
21.6
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
220
205
26.1
95
240
28
0
135
29.2
145
130
30.7
225
205
29.4
95
195
29.3
70
240
26.2
0
130
27.7
0
50
27.2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
260
19.8
5
0
18
0
0
14.1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
9.9
0
0
9.8
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
10.5
0
0
12.6
0
0
11.1
0
0
0
0
0
0
0
0
0
0
0
0
150
208
15.7
0
0
15
180
237
17.3
160
208
16.8
150
208
17.3
0
0
0
0
0
0
0
0
0
Variable Name
Week
Sales
TV
Radio
Fuel Volume
Fuel Price
Temperature
Precipitation (mm)
Visits (1 or 2)
Holiday
What this variable means?
Week number (of the year)
Convenience store sales (in euros), per store, on average across all 100 stores in the Marseille area, for that week
Number of TV GRPs that week (1 GRP = 1% of target market saw/heard the ad once)
Number of Radio GRPs that week (1 GRP = 1% of target market saw/heard the ad once)
Fuel volume sales (in liters), per station, on average across all 100 stations in the Marseille area, for that week (all f
Average price of fuel (in cents) in the Marseille area, for that week (across all fuel types)
Average high temperature (in C) recorded in the Marseille area, for that week.
Total precipitation (in mm) in the Marseille area, for that week
The percentage of survey respondents reporting 1 to 2 visits to a EurePet store in previous week (vs. "0 times" or "
This is a "dummy variable;" Holiday = 1 if there was a national or school holiday that week; Holiday = 0 if there was
area, for that week
ea, for that week (all fuel types)
eek (vs. "0 times" or "3 or more times").
oliday = 0 if there was no holiday that week
Purchase answer to see full
attachment