Multiple Linear Regression Models – Part 2
Residual Diagnostics, Unusual observations
STAT3022
Applied linear models
Regression Diagnostics
Background
Recall the MLR model
y = Xβ + ε, E(y) = Xβ, Var(y) = Var(ε) = σ2In
Assuming the design matrix X is full-ranked,
Background
Similar to model diagnostics for SLR, diagnostic for MLR is based
on the residuals, which depends critically on the hat matrix H.
H is symmetric, i.e H> = H. As a result, the matrix In ?H
is also symmetric.
Next, HX = X. As a result, (In?H)X = X?X = 0.
Third, H2 = H, so we say H is idempotent. As a result, the
matrix In ?H is also idempotent.
Finally, as proved in the Tutorial 4, trace(H) =
∑n
i=1 hii = p.
2
Residual vector
? First, let’s compute its expectation:
E(e) = E {(In?H)y} = (In?H)E(y) = (In?H)Xβ = 0.
? Second, let’s compute the variance-covariance matrix.
Var(e) = Var {(In?H)y} = (In?H) Var(y)(In?H)>
= (In?H)σ2 In(In?H) = σ2(In?H)(In?H)
= σ2(In?H),
i.e Var(ei) = σ
2(1? hii), Cov(ei, ej) = ?σ2hij .
These computation tell us that (1) each residual term ei has a
smaller variance than the true error εi, and (2) these residuals are
correlated.
3
Residuals plots
We can use similar residual plots similar to in the case of simple
linear regression for model diagnostics. Specifically,
To check constant variance assumption: Use the plot of
residual ei vs. fitted values y?i or the plot of residual vs. each
covariate. no news is good news.
To check normality assumption: Use normal quantile-quantile
plot, or normality test.
4
A reasonable constant-variance
A distinct characteristic of MLR compared to SLR is that they
have more than one predictor. As such, the intercorrelation
between predictors play important roles in the estimated
coefficients as well as inference of the MLR.
Such intercorrelation is known as multicollinearity (multi:
many; collinear: linear dependence).
We will study three cases:
- When all predictors are uncorrelated.
- When all predictors are perfectly correlated.
- When all predictors are correlated but not perfectly correlated.
In this section, we denote rjk as the sample correlation
between two predictors Xj and Xk.
1
Uncorrelated predictors
Consider the models
yi = β0 + β1xi1 + β2xi2 + εi (1)
yi = β0 + β1xi1 + εi (2)
yi = β0 + β2xi2 + εi (3)
If r12 = 0 (i.e X1 and X2 are uncorrelated), then
The OLS estimates for β1 of model (1) and model (2) are
exactly the same.
The OLS estimates for β2 of model (1) and model (3) are
exactly the same.
SSR(X1, X2) = SSR(X1) + SSR(X2)
2
An example: Kutner et al. (Table 7.6)
Example: effect of work crew size (X1) and level of bonus pay
(X2) on crew productivity (Y). X1 and X2 are uncorrelated.
3
An example: Kutner et al. (Table 7.6)
4
Uncorrelated predictors
In general, if all p? 1 predictors are mutually uncorrelated:
The effect of one predictor on the response does not depend
on whether these other predictors are in the model.
Hence, we can get the effect of one predictor Xj on the
response Y just by fitting SLR of Xj and Y .
We do not go into the math of this conclusion, but intuitively,
when all the predictors are uncorrelated, they have“separate”
effects on the response.
You will see this case again when we talk about experimental
designs.
5
Perfectly correlated predictors
The second (extreme) case is when one or some predictors are
perfectly correlated with one another.
Essentially, that just means one predictor can be written as the
linear combination of some other predictor variables. In this
case, the design matrix X is not full-ranked, i.e rank(X) < p.
Recall the normal equation for OLS:
X>Xb = X> y
and rank(X>X) = rank(X). Hence, in this case, the matrix
X>X is also not full-ranked, and we will have infinite
number of solutions for b.
Perfectly correlated predictors
Though we have infinitely number of solutions for b, all
solutions give the same fitted values (and residuals).
Therefore, while there is no interpretation for b, the model
can still provide a good fit for the data.
9
Highly correlated predictors
Although these above cases are extreme, in reality, it is very
common to find many predictors are highly correlated. At the end,
highly correlated variables are inherent characteristics of the
population of interest.
Example: Regression of food expenditures on income, savings,
age of head of household, educational level, etc., all the
predictors are correlated with one another.
Mathematically, although the design matrix X and the matrix
X>X still have the full rank, the inversion V = (X>X)?1
become unstable.
Recall that Var(β?) = σ2V, so multicollinearity inflates the
variance of the OLS estimator.