关于算法:STAT3022线性表达式模型

41次阅读

共计 3825 个字符,预计需要花费 10 分钟才能阅读完成。

Multiple Linear Regression Models – Part 2
Residual Diagnostics, Unusual observations
STAT3022
Applied linear models
Regression Diagnostics
Background
Recall the MLR model
y = Xβ + ε, E(y) = Xβ, Var(y) = Var(ε) = σ2In
Assuming the design matrix X is full-ranked,
Background
Similar to model diagnostics for SLR, diagnostic for MLR is based
on the residuals, which depends critically on the hat matrix H.
H is symmetric, i.e H> = H. As a result, the matrix In ?H
is also symmetric.
Next, HX = X. As a result, (In?H)X = X?X = 0.
Third, H2 = H, so we say H is idempotent. As a result, the
matrix In ?H is also idempotent.
Finally, as proved in the Tutorial 4, trace(H) =
∑n
i=1 hii = p.
2
Residual vector
? First, let’s compute its expectation:
E(e) = E {(In?H)y} = (In?H)E(y) = (In?H)Xβ = 0.
? Second, let’s compute the variance-covariance matrix.
Var(e) = Var {(In?H)y} = (In?H) Var(y)(In?H)>
= (In?H)σ2 In(In?H) = σ2(In?H)(In?H)
= σ2(In?H),
i.e Var(ei) = σ
2(1? hii), Cov(ei, ej) = ?σ2hij .
These computation tell us that (1) each residual term ei has a
smaller variance than the true error εi, and (2) these residuals are
correlated.
3
Residuals plots
We can use similar residual plots similar to in the case of simple
linear regression for model diagnostics. Specifically,
To check constant variance assumption: Use the plot of
residual ei vs. fitted values y?i or the plot of residual vs. each
covariate. no news is good news.
To check normality assumption: Use normal quantile-quantile
plot, or normality test.
4
A reasonable constant-variance
A distinct characteristic of MLR compared to SLR is that they
have more than one predictor. As such, the intercorrelation
between predictors play important roles in the estimated
coefficients as well as inference of the MLR.
Such intercorrelation is known as multicollinearity (multi:
many; collinear: linear dependence).
We will study three cases:

  1. When all predictors are uncorrelated.
  2. When all predictors are perfectly correlated.
  3. When all predictors are correlated but not perfectly correlated.
    In this section, we denote rjk as the sample correlation
    between two predictors Xj and Xk.
    1
    Uncorrelated predictors
    Consider the models
    yi = β0 + β1xi1 + β2xi2 + εi (1)
    yi = β0 + β1xi1 + εi (2)
    yi = β0 + β2xi2 + εi (3)
    If r12 = 0 (i.e X1 and X2 are uncorrelated), then
    The OLS estimates for β1 of model (1) and model (2) are
    exactly the same.
    The OLS estimates for β2 of model (1) and model (3) are
    exactly the same.
    SSR(X1, X2) = SSR(X1) + SSR(X2)
    2
    An example: Kutner et al. (Table 7.6)
    Example: effect of work crew size (X1) and level of bonus pay
    (X2) on crew productivity (Y). X1 and X2 are uncorrelated.
    3
    An example: Kutner et al. (Table 7.6)
    4
    Uncorrelated predictors
    In general, if all p? 1 predictors are mutually uncorrelated:
    The effect of one predictor on the response does not depend
    on whether these other predictors are in the model.
    Hence, we can get the effect of one predictor Xj on the
    response Y just by fitting SLR of Xj and Y .
    We do not go into the math of this conclusion, but intuitively,
    when all the predictors are uncorrelated, they have“separate”
    effects on the response.
    You will see this case again when we talk about experimental
    designs.
    5
    Perfectly correlated predictors
    The second (extreme) case is when one or some predictors are
    perfectly correlated with one another.
    Essentially, that just means one predictor can be written as the
    linear combination of some other predictor variables. In this
    case, the design matrix X is not full-ranked, i.e rank(X) < p.
    Recall the normal equation for OLS:
    X>Xb = X> y
    and rank(X>X) = rank(X). Hence, in this case, the matrix
    X>X is also not full-ranked, and we will have infinite
    number of solutions for b.
    Perfectly correlated predictors
    Though we have infinitely number of solutions for b, all
    solutions give the same fitted values (and residuals).
    Therefore, while there is no interpretation for b, the model
    can still provide a good fit for the data.
    9
    Highly correlated predictors
    Although these above cases are extreme, in reality, it is very
    common to find many predictors are highly correlated. At the end,
    highly correlated variables are inherent characteristics of the
    population of interest.
    Example: Regression of food expenditures on income, savings,
    age of head of household, educational level, etc., all the
    predictors are correlated with one another.
    Mathematically, although the design matrix X and the matrix
    X>X still have the full rank, the inversion V = (X>X)?1
    become unstable.
    Recall that Var(β?) = σ2V, so multicollinearity inflates the
    variance of the OLS estimator.
正文完
 0