乐趣区

关于深度学习:MATH3821-统计模型与计算

MATH3821 Statistical Modelling and Computing
T2 2022
Assignment One
In your report, please include all relevant R commands and outputs di-
rectly relevant to your answers. The total marks for this assignment is 52 (5
marks will be given to the overall presentation).
Consider the Advertising data set, Advertising.csv, This dataset in-
cludes advertising costs and sales income.

  1. [2 marks] Use an appropriate R function to import Advertising.csv
    into R, what are the variables contained in this file?
  2. [2 marks] If a regression model can be constructed to model the rela-
    tionship between the variables, which variable should be the response
    variable? Justify your response with one sentence.
  3. [4 marks] Using pairwise plots, do you think a linear model can
    be used here? In your answer, include the code and output of the
    graphical tools you used, and comment on whether there are apparent
    relationships between the predictor and response variables, and which
    variable/s do you expect to be significant as predictor/s.
  4. [9 marks] Write down the expression for the linear model, using the
    notations Y as the response vector, X as design matrix, and β as
    vector of coefficients, clearly define all variables you use. Write down
    the expression for the log likelihood function (assuming normality), and
    derive the MLE for β.
  5. [6 marks] Show that the MLE β? is unbiased and derive the variance
    of β?.
    1
  6. [13 marks] Write an R function called Leastfit which takes as in-
    put, a response variable Y and covariates X. The function will com-
    pute least squares fitting of your response variable of Part (2) and the
    remaining variables as predictors. This function will outputs a list con-
    taining three variables, $parameter, the least squares estimate of Part
    (4); $var the variance of each β? as given in Part (5); $fitted for the fit-
    ted values and the residuals $res of the regression. Test your function
    using the Advertising data.
  7. [4 marks] Produce a diagnostic plot of residuals against fitted values,
    clearly labelling the x and y axis, your plot should also include a title.
    Superimpose on this plot a horizontal red line at zero to indicate the
    location of 0 in the residuals.
  8. [5 marks] Using R’s lm() function, repeat the regression analysis
    above, do you think any of the predictors can be removed? State any
    relevant test used, including the formulation of the hypothesis and the
    calculation of the test statistic, and the corresponding conclusions.
  9. [2 marks] Plot the set of diagnostic graphs using R’s lm() function.
    Based on the plots, which linear model assumptions do you think may
    have been violated?
退出移动版