关于算法:SRW-MAST90007统计算法

50次阅读

共计 11290 个字符,预计需要花费 29 分钟才能阅读完成。

SRW MAST90007 2021 Major

MAST90007: Statistics for Research Workers 2022

1,500 word assignment

Due: 5 pm, Friday 29 July 2022

Submission

Submit an electronic copy of the assignment via the LMS.

A reminder: When submitting your assignment, you will be asked to complete the onlineplagiarism declaration. This assignment must be your own work.

This assignment contains three (3) questions worth a total of 30 marks. There is somegeneral advice on the assignment at the end of this document, on page 7.
The overall requirement for this assignment is to carry out and report on data analytics that
address three questions about the data from the Framingham heart study.
You may know about this study from your general knowledge; it is one of the most famousstudies in epidemiology. You can learn about the study from information on Wikipedia(https://en.wikipedia.org/wiki/Framingham_Heart_Study), but also through these references:
Levy, D., National Heart Lung and Blood Institute., et al. (1999). 50 years ofdiscovery: medical milestones from the National Heart, Lung, and Blood Institute’sFramingham Heart Study. Hackensack, N.J., Center for Bio-Medical CommunicationInc.
Mahmood, S. S., Levy, D., Vasan, R. S., & Wang, T. J. (2014). The Framingham HeartStudy and the epidemiology of cardiovascular disease: a historical perspective. TheLancet, 383(9921), 999-1008.
Oppenheimer, G. M. (2005). Becoming the Framingham study 1947–1950. AmericanJournal of Public Health, 95(4), 602-610.
You may also find your own useful references. You are not required to read these referencesfor the purposes of the assignment.
The data file contains some information from long term follow up as well as baselinemeasures. The file contains records for 5,209 people – all the participants in the original
cohort of the study. The participants were followed up every 2 years. The data file includesinformation from baseline, the 2nd examination (one variable), and the 16th examination (30years after baseline).
SRW MAST90007 2022 Major assignment

The data file includes:

Age at baseline (years)
Height at baseline (inches)
Weight at baseline (pounds)
Body Mass Index at baseline (kg/m2)
Sex Female / Male
Diastolic blood pressure at baseline (mmHg)
Systolic blood pressure at baseline (mmHg)
Serum cholesterol (mg/100ml) examination 1 Serum cholesterol (mg/100ml) at baseline;this variable has 2,037 missing values.Serum cholesterol (mg/100ml) examination 2 Serum cholesterol (mg/100ml) at the 2nd examination; this variable has 626 missingvalues.
Serum cholesterol (mg/100ml) baseline Baseline serum cholesterol at examination 1, or, when missing at examination 1, theserum cholesterol at the secondexamination.Metropolitan Relative Weight at baseline A measure of the percentage of actualweight to desirable weight; a measure very similar to BMI.Smoker at baseline Smoker / Non-smoker Number cigarettes smoked per day at baseline
Last examination number Number of the last examination that the
person participated in.
Survived at last examination 0 = alive at 16th examination; 1 = died prior to 16th examination
Cause of death 0 = still alive
1 = sudden death from coronary heart
disease (CHD)
2 = other coronary heart disease
3 = stroke (cerebrovascular accident, CVA)
4 = other cerebral vascular disease
5 = cancer
6 = other causes of death
9 = cause unknown
Examination at which CHD diagnosed, if applicable
SRW MAST90007 2022 Major assignment

The data were accessed from:

http://courses.washington.edu/b513/datasets/datasets.php?clas…

The data file is Framingham.xlxs. You can drop and drag this file into Minitab.
When you do this, some of the variable names will be truncated; you will need to correct
them to make them clear by shortening them.
There are some references to column numbers in the assignment. These numbers will be
correct if you simply drag and drop the Excel file into Minitab; obviously, if you insert
columns yourself in the Minitab file, your column numbers may differ from those given
here.

SRW MAST90007 2022 Major assignment
Question 1 – Baseline data [9 marks]
This question focuses on baseline characteristics and data.
(a) Briefly describe the design of the study to provide context for the analyses you report.

(b) Produce a summary table to describe the following characteristics of the study
participants: age at baseline, height at baseline, weight at baseline and sex.

(c) Consider systolic and diastolic blood pressure at baseline. Produce suitable visual
display(s) to allow a comparison of the distributions of these according to whether or
not an individual was a smoker at baseline. You can exclude those with missing
information about smoking from visual displays using Data Options > Group options.

(d) Carry out appropriate analyses to compare those who were smokers at baseline with
those who were not, for systolic and diastolic blood pressure. Provide one or more
suitable tables that includes the summary statistics and inferential statistics.

(e) Discuss and justify any assumptions underlying your choice of analysis.

(f) Write a summary of the analyses you have carried out explaining the results of all the
comparisons you have made. Write the summary for a doctor interested in the
practical application of the study results.

(g) Consider predicting systolic blood pressure at baseline from age and Metropolitan
relative weight at baseline. Provide graphical display(s) to illustrate the distributions
of the explanatory variables. Explain if you would recommend rescaling these
variables for this analysis. If appropriate, rescale the variables. Fit the model and
obtain the parameter estimates for each of the explanatory variables. Explain the
meaning of the parameter estimates for each of these explanatory variables, according
to whether you have recommended rescaling or not. (You do not need to report other
details of the analysis.)

(h) A colleague is also working with the same data file, and says:“This is great! The
sample size is so big, everything is really, really significant; this whole study gives so
many meaningful findings.”Respond to this comment.

SRW MAST90007 2022 Major assignment
Question 2 – Serum cholesterol at baseline [12 marks]
Serum cholesterol (mg/100ml) at baseline (column 10 in the datafile) is defined as serum
cholesterol at examination 1 (the true baseline), or, when missing at examination 1, the
serum cholesterol at the second examination. For many people in the study, serum
cholesterol at both examinations 1 and 2 was available.
(a) Produce an appropriate graph showing the relationship between Serum cholesterol
(mg/100ml) examination 1 and Serum cholesterol (mg/100ml) examination 2.

(b) Describe the relationship between the two variables, and give a suitable summary
statistic.

(c) Fit a linear regression predicting Serum cholesterol (mg/100ml) examination 1 from
Serum cholesterol (mg/100ml) examination 2. Provide an appropriate summary table
and give a plain language explanation of the estimates of the parameters of the model.

(d) Find a 95% prediction interval for Serum cholesterol (mg/100ml) examination 1 when
Serum cholesterol (mg/100ml) examination 2 is 300 (mg/100ml). Explain its meaning.

(e) A colleague asks if using the Serum cholesterol (mg/100ml) examination 2 value itself
as the estimate of Serum cholesterol (mg/100ml) examination 1 is a good idea; for
example, if Serum cholesterol (mg/100ml) examination 2 = 275, predict that Serum
cholesterol (mg/100ml) examination 1 = 275. (This is, in fact, what was done.) Does
this under-estimate, or over-estimate Serum cholesterol (mg/100ml) examination 1,
using the data available? Provide a graph that will help answer this question. (Hint:
Consider adding a Calculated line to show y = x.) Provide an explanation in writing.

(f) Consider improving the prediction of Serum cholesterol (mg/100ml) examination 1.
Explain, in principle, a possible approach. You do not need to implement the
approach.

(g) A key research question is about the relationship of smoking status at baseline and sex
to Serum cholesterol (mg/100ml) baseline (column 10). Describe a suitable statistical
model for answering this question, and explain the effects that will be considered in the
model.

(h) Use Minitab to fit the model that you have specified in part (g). Provide a summary
table of the Analysis of variance, and give a plain language explanation of the meaning
of the P-values associated with each of the explanatory variables. Use concrete terms in
relation to the Framingham study, rather than in abstract form.

(i) State one assumption required for analysing the data using the model you have
suggested. State if the assumption is reasonable and provide relevant evidence.

(j) Provide an appropriate graphical display to summarise the findings in relation to the
model you have fitted in (h).

SRW MAST90007 2022 Major assignment
(k) Find 95% confidence intervals for the effects of sex and smoking status on serum
cholesterol at baseline; use Fisher intervals and provide those that best describe the
results. Provide a suitable report of these confidence intervals, including a plain
language explanation in concrete terms.

Question 3 – Survival at last examination [9 marks]
Consider Survived at last examination; this is in column 15.
(a) Produce a graph of the data that allows a comparison of Survived at last examination
in terms of sex.

(b) Comment on any differences for sex, based on the graph.

(c) Estimate the difference in proportions (for sex) surviving at the last examination, and
the 95% confidence for this difference. Write a plain language explanation of the
results, using concrete terms in relation to the Framingham study.

(d) Carry out a logistic regression analysis of“Survived at last examination”using sex as a
predictor. Write a summary of the results, again suitable for a doctor interested in the
findings.

(e) Subset the Minitab worksheet to exclude those who have survived at examination 16,
so that you have the subset of subjects who died prior to examination 16.

Explore the relationship between cause of death and sex, using a suitable graphical
display. You may consider combining causes of death, if you think this is appropriate.
(Hint: Data > Recode). Provide a suitable graph with a brief written description of the
patterns in the graph.

(f) A colleague wants to consider predicting Survived at last examination from Serum
cholesterol (mg/100ml) examination 1 (column 8). She notes that some of the values are
missing. Your colleague suggests says“I don’t think we need to worry about that as
there will still be plenty of data to carry out an analysis”. Provide a response to this,
explaining any assumptions involved, and include a summary table to describe the
amount of missing data for Serum cholesterol (mg/100ml) examination 1.

(g) At the time that the Framingham study, diastolic blood pressure was believed to be a
superior measure of blood pressure compared with systolic blood pressure. High levels
of systolic blood pressure were not believed to be important in terms of health
outcomes. Examine the relationship between these two measures of blood pressure at
baseline visually. Provide a plot that represents this relationship.

Consider the summary table providing the results of three logistic regression models
predicting Survived at last examination, shown on the next page.

SRW MAST90007 2022 Major assignment
Model Explanatory variable(s)Odds ratio95% confidence intervalfor Odds ratioP-value
1 Systolic blood pressure/10 1.34 1.31, 1.38 < 0.001
2 Diastolic blood pressure/10 1.53 1.46, 1.60 < 0.001
3 Systolic blood pressure/10 1.32 1.27, 1.38 < 0.001
Diastolic blood pressure/10 1.04 0.96, 1.12 0.341

Based on these analyses and your examination of the explanatory variables, comment on the
belief about the“superior”blood pressure measure in predicting survival at the last
examination. Formal analyses are not required to answer this question.

Advice
Here is some advice to follow when preparing your assignment.
? The purpose of the assignment is to relate the statistical theory and practice learned
in Statistics for Research Workers to real world data. The essential feature is that you
must demonstrate understanding and application of statistical ideas covered in SRWto real world practice. The presentation of results should be consistent with the principles for presentinggraphics and tables discussed in the course.
In general, you are not required to provide Minitab output in the assignment, withthe exception of graphs.
The word limit for the assignment is 1,500 words. From our point of view, this is anupper limit for the assignment and you should aim to submit between 1,400 and1,500 words. The word count does not include graphs and tables. University policyallows for a 10% deduction of marks once a written assignment exceeds 10% of thespecified word limit. As the 1,500-word assignment is worth 20% of your final mark,
you could lose 2% from your final mark if your assignment was, for example, 1,670words.

Your answers should be on no more than twelve (12) A4 pages of standard sizedwriting. This includes any graphs. Twelve pages is a generous limit for theassignment; this document is on seven pages, with white space, and it containsaround 2,000 words.You do not need to reproduce the questions in your assignment.

正文完
 0