MATH3821 Statistical Modelling and Computing
T2 2022
Assignment One
In your report, please include all relevant R commands and outputs di-
rectly relevant to your answers. The total marks for this assignment is 52 (5
marks will be given to the overall presentation).
Consider the Advertising data set, Advertising.csv, This dataset in-
cludes advertising costs and sales income.
- [2 marks] Use an appropriate R function to import Advertising.csv
into R, what are the variables contained in this file? - [2 marks] If a regression model can be constructed to model the rela-
tionship between the variables, which variable should be the response
variable? Justify your response with one sentence. - [4 marks] Using pairwise plots, do you think a linear model can
be used here? In your answer, include the code and output of the
graphical tools you used, and comment on whether there are apparent
relationships between the predictor and response variables, and which
variable/s do you expect to be significant as predictor/s. - [9 marks] Write down the expression for the linear model, using the
notations Y as the response vector, X as design matrix, and β as
vector of coefficients, clearly define all variables you use. Write down
the expression for the log likelihood function (assuming normality), and
derive the MLE for β. - [6 marks] Show that the MLE β? is unbiased and derive the variance
of β?.
1 - [13 marks] Write an R function called Leastfit which takes as in-
put, a response variable Y and covariates X. The function will com-
pute least squares fitting of your response variable of Part (2) and the
remaining variables as predictors. This function will outputs a list con-
taining three variables, $parameter, the least squares estimate of Part
(4); $var the variance of each β? as given in Part (5); $fitted for the fit-
ted values and the residuals $res of the regression. Test your function
using the Advertising data. - [4 marks] Produce a diagnostic plot of residuals against fitted values,
clearly labelling the x and y axis, your plot should also include a title.
Superimpose on this plot a horizontal red line at zero to indicate the
location of 0 in the residuals. - [5 marks] Using R’s lm() function, repeat the regression analysis
above, do you think any of the predictors can be removed? State any
relevant test used, including the formulation of the hypothesis and the
calculation of the test statistic, and the corresponding conclusions. - [2 marks] Plot the set of diagnostic graphs using R’s lm() function.
Based on the plots, which linear model assumptions do you think may
have been violated?