关于算法:题型STATS-3860B解说

54次阅读

共计 3156 个字符,预计需要花费 8 分钟才能阅读完成。

Assignment 3
/9155B
Winter 2023
This assignment is due Monday, April 10th, at 11:55 pm.
You must write your R code and answers using Rmarkdown generating a
single pdf file.
Submissions must be made via Gradescope. You must carefully assign each
question part to its corresponding page (or pages) on your pdf file. Question parts
with no pages assigned to them will receive zero marks.
Each student must submit their own work. Scholastic offences are taken seriously,
and students are directed to read the appropriate policy, specifically, the definition of
what constitutes a Scholastic Offence, at the following Web site: http://www.uwo.ca/u
nivsec/pdf/academic_policies/appeals/scholastic_discipline_undergrad.pdf
Question 1
The denim dataset concerns the amount of waste in material cutting for a jeans manufacturer
due to five suppliers. Consider the code below to first remove two outliers from the dataset.
library(faraway)
data(denim)

removing 2 outliers

denim <- denim[-which(denim$waste == max(denim$waste)),]
denim <- denim[-which(denim$waste == max(denim$waste)),]
dim(denim); head(denim)

[1] 93 2

waste supplier

1 1.2 1

2 16.4 2

3 12.1 3

4 11.5 4

5 24.0 5

6 10.1 1

1
a) Plot the data and comment.
b) Fit the linear fixed effects model. Is the supplier significant?
c) Analyze the data with supplier as a random effect. What is the estimated standard
deviation of the supplier random effect?
d) Regarding the model fitted in c), test the significance of the supplier term. Compare
with the results in b).
e) Compute confidence intervals for the random effects standard deviations. Compare with
the results in d).
Question 2
Refer to Exercise 2 page 251 of the textbook. Dataset hprice. Work on parts a) to g).
library(faraway)
str(hprice)

‘data.frame’: 324 obs. of 8 variables:

$ narsp : num 4.22 4.27 4.33 4.36 4.39 …

$ ypc : int 13585 14296 15413 16490 17634 18210 17958 18659 19360 15354 …

$ perypc : num 6.47 5.23 7.81 6.99 6.94 …

$ regtest: int 20 20 20 20 20 20 20 20 20 18 …

$ rcdum : Factor w/ 2 levels “0”,”1″: 1 1 1 1 1 1 1 1 1 1 …

$ ajwtr : Factor w/ 2 levels “0”,”1″: 1 1 1 1 1 1 1 1 1 1 …

$ msa : Factor w/ 36 levels “1”,”2″,”3″,”4″,..: 1 1 1 1 1 1 1 1 1 2 …

$ time : int 1 2 3 4 5 6 7 8 9 1 …

Question 3
The data set prostate (library faraway) is from a study of 97 men with prostate cancer who
were due to receive a radical prostatectomy. Consider lweight as the response and only age
as a predictor.
a) Plot the data and fit a curve using kernel methods with a cross-validated choice of
bandwidth. Plot the fit on the top of the data. What is the effect of the outlier? Does
the fit look linear?
b) Compute the smoothing spline with default amount of smoothing (by cross-validation)
along with a 95% confidence band. Do you think a linear fit is plausible for these data?
2
Question 4
The mcycle dataset contains the head acceleration (y) over time (x) after impact when
simulating a motorcycle accident.
library(MASS)
data(mcycle)
str(mcycle)

‘data.frame’: 133 obs. of 2 variables:

$ times: num 2.4 2.6 3.2 3.6 4 6.2 6.6 6.8 7.8 8.2 …

$ accel: num 0 -1.3 -2.7 0 -2.7 -2.7 -2.7 -1.3 -2.7 -2.7 …

a) Fit a regression splines to these data with 40 basis functions. Plot the data along with
the fitted curve. Comment on the estimated function obtained by this fit.
b) Compute the percentage of the variance in y explained by the regression splines fit
obtained in a) adjusting for the degrees of freedom (that is, taking into account the
number of parameters).
c) Fit a smoothing splines fit choosing λ by cross-validation. Plot the data along with the
fitted curve and compare it with the fitted curve in a).
d) Compute the percentage of the variance in y explained by the smoothing splines fit
obtained in c) adjusting for the degrees of freedom. Compare with the result obtained
in part b).

正文完
 0