关于算法:STA-471

5次阅读

共计 5215 个字符，预计需要花费 14 分钟才能阅读完成。

STA 471
STA 471 Due: 5/15/2019
Final Exam
When compiling your answers to the following questions, follow all guidelines for homework
assignments listed in the syllabus. A hard copy of your work is to be turned in to my office
(Kimball 810) by 5:00 PM on the due date. You are not permitted to collaborate on these
questions with another student.

One method used to identify whether a patient requires a hearing aid is to play a recording of
a set of words being pronounced quietly, then request that the patient repeat those words.
The number of words correctly identified (“Hearing”) by a set of patients is recorded in
hearing.txt (UBLearns). Four different recordings (“ListID”) were used, containing different
sets of words. The purpose of this study is to determine whether the lists of words are
equally difficult to hear. (15 pts)
a) Produce side-by-side boxplots of the hearing scores by list ID.
b) State the hypotheses to be tested in this study.
c) Fit an appropriate model to address the study question, and present output that displays
the test statistic and p-value. Also state the conclusion in context.
d) Reproduce the p-value from part (c) using the pf(.) function.
e) What percentage of total variability in hearing scores is explained by list ID?
f) Identify which pairs of lists have mean hearing scores that differ significantly.
g) Test whether the model residuals are normally distributed.
h) Test the assumption of constant variance using the Levene test, providing the hypotheses,
p-value, and conclusion.
The following data are from a study carried out decades ago regarding attitudes toward sex
education being instituted in public schools. (10 pts)
Disposition Sex Education
Favor Oppose
Conservative 645 142
Moderate 812 129
Liberal 766 65
a) Produce a single barplot that displays all the data
b) Create a labeled matrix object to store the data.
c) Carry out the chi-square test for association. Give the hypotheses, test statistic, p-value,
and conclusion in context.
d) Reproduce the p-value from part (b) using the pchisq(.) function.
e) Examine the standardized Pearson residuals from the chi-square test, and describe how
the observed data depart from independence.
This question will involve a comparison of the two-sample procedures we have considered in
this course. (15 pts)
a) Set the randomization seed to 4444. Generate one sample of size 1 = 35 from
and another sample of size 2 = 35 from . We
wish to test. Obtain the p-value for this test using each of the following
procedures:
i. The two-sample t-test (equal variances).
ii. The two-sample t-test (unequal variances).
iii. The paired t-test.
b) Obtain a fourth p-value, this time using the Wilcoxon rank sum test of whether the two
population medians are the same.
c) Generate data and carry out the four tests a large number of times, say ?? = 10,000 (take
care that you are no longer using a randomization seed). At the end of the simulation,
you should have 10,000 p-values for each of the four procedures. Report the simulationbased
type I error rate for each procedure.
d) Based on your simulation, how do these four procedures perform when the two
populations have the same location parameters?
e) Now change the value of 2 to 22, and re-run the simulation, again using = 10,000.
Report the simulation-based estimates of power for all four procedures in this scenario.
f) Describe your power results – are they different than you expected
When MRI brain scans first became available, an interesting research question involved the
relationship between measurable brain size and IQ. Forty psychology students volunteered
for MRI scans of their brains, and brain size was recorded in terms of the number of pixels
mapped by the scan (brain_size.txt). (10 pts)
a) Fit the simple linear regression model:
and provide the estimated regression coefficients.
b) Test for a linear relationship between IQ and pixel count. Give the hypotheses, test
statistic, p-value, and conclusion in context.
c) There are additional variables present in the data set that may be related to IQ. Use the
simple linear regression model fit previously as your base model. Use the forward
selection technique to determine whether any of Height, Weight, or Gender can be
included as significant predictors of IQ. Do not include excessive output.
d) Obtain the
statistic for your final model.
Nerds frequently impersonate fantasy characters and roll dice to determine what happens in
their silly adventure game. Usually this involves rolling a single 20-sided die. Other times,
the player may need to roll, for example, eight 6-sided dice. It would be nice to have a way
to quickly simulate the rolling of multiple dice. (10 pts)
a) Write a program to simulate rolling a single 20-sided die. The possible outcomes are all
integers between 1 and 20, and each outcome should be equally likely. While not
required, you may wish to use existing functions like runif(.) and floor(.).
b) To give evidence that your program works properly, execute it 10,000 times, store the
value of each roll, and use them to build a barplot. Use your barplot to make an argument
that the code works as intended.
c) Write a function called“roll1”that simulates rolling a single die with a user-supplied
number of sides (i.e., the single argument passed to the function is the number of sides on
the die.) For example, the code“roll1(sides=20)”should simulate rolling a single 20-
sided die.
d) Write a more general function called“roll”which takes two arguments: the number of
identical dice to be rolled, and the number of sides on one of the dice. For example, the
code“roll(number=8, sides=6)”should roll 8 standard six-sided dice.
e) In nerdy fantasy games, it’s a really big deal when your 20-sided die lands on 20. It is
extremely rare for a player to roll 20’s on consecutive throws. It is unheard of to roll
three straight 20’s. Use your“roll”function and a while loop to count how many
attempts (an attempt is the rolling of three 20-sided dice) it takes to roll three straight 20’s
in simulation. Call this value“number_of_attempts”.
f) Store = 1,000 values of“number_of_attempts”and create a histogram. (This will take
quite a while to run. Go get some lunch; no joke.)
WX：codehelp