乐趣区

关于算法:STAT-337难点分析

STAT 337 ASSIGNMENT 2 Due: 5:00pm EDT Thursday, June 16, 2022
Notes for Submission: Upload your assignment directly to Crowdmark via the link you
receive by email. It is your responsibility to make sure your solution to each question is
submitted in the correct section, that the pages are rotated correctly, and that everything is
legible. Typed solutions are preferred.
Notes on the use of statistical software: Unless specifically told otherwise, you are free
to do your calculations using any software you like (SAS, R, Excel, etc) but your solutions
should clearly explain the steps you used in the computation, showing intermediate calcu-
lations when necessary, and give the formulas that you used. Any code and output created
should also be submitted.

  1. [6 marks] In 2020, a group of eight articles were published in the Journal of Studies on
    Alcohol and Drugs summarizing the current scientific literature and evidence related to
    the research question: Does exposure to alcohol marketing have a causal influence on
    youth drinking?1 For each statement below (lighted edited from the original source),
    indicate which of the seven Bradford Hill criteria discussed in class are related to the
    statement. Multiple criteria may be addressed in each case.
    (a) Jernigan et al. (2017) conducted a systematic review of longitudinal studies that examined
    exposure to advertising and drinking among underage persons. All 12 studies found a positive
    association between marketing exposure and one or more alcohol consumption outcomes. For
    initiation of alcohol use the odds ratios for di?erent marketing exposures ranged from 1.00 to
    1.69, and for subsequent hazardous or binge drinking, the range was somewhat higher: 1.38 to
    2.15.
    (b) In recent years, psychologists have developed and tested theoretical models in which marketing
    exposures are hypothesized to a?ect psychological mediators relating to thoughts, cognitions and
    attitudes. These marketing-induced changes are hypothesized to predict whether an individual
    will engage in drinking behaviour. Jackson and Bartholow (2020) provide a narrative summary
    of psychological plausibility using an integrated conceptual model that depicts relevant psycho-
    logical processes as they work together in a complex chain of influence.
    (c) Hanewinkel et al. (2008) conducted a prospective observational study of 2110 German adoles-
    cents younger than 15 years who had never smoked or drunk alcohol at baseline. The percentage
    of students who tried smoking was 16.3%, 10.9% initiated binge drinking and 5.0% used both
    substances during the follow-up period. There was a significant e?ect of parental movie restric-
    tion on each substance use outcome measure after controlling for covariates. Compared with
    adolescents whose parents never allowed them to view FSK-16 movies (movies that only those
    aged 16 years and over would be allowed to see in theatres), the adjusted relative risk (RR) for
    use of both substances were 1.64 for adolescents allowed to view them once in a while, 2.30 for
    sometimes and 2.92 for all the time. FSK-16 restrictions were associated with substantially lower
    exposure to movie depiction of tobacco and alcohol use.
    1Sargent, J. D., Cukier, S., & Babor, T. F. (2020). Alcohol marketing and youth drinking: is there a causal
    relationship, and why does it matter?. Journal of Studies on Alcohol and Drugs, Supplement, (s19), 5-12.
    1
  2. [10 marks]
    (a) [4 marks] HIV disease may increase susceptibility to other viral infections. A co-
    hort study investigated the association of HIV with the occurrence of cytomegalovirus
    (CMV) infection, a common herpes virus. Researchers screened infectious disease
    clinics to identify a cohort of 400 HIV-positive patients who were seronegative for
    CMV. The researchers then identified a comparison cohort of 400 people without
    HIV disease from primary care clinics who were also CMV seronegative. Study
    personnel conduct annual testing to assess new CMV infections, defined by the
    development of antibodies to the virus. The study data are presented in Tables 1
    and 2.
    For each of the six characteristics listed in Table 1 determine whether or not it is a
    potential confounder for the association between HIV and incident CMV infection.
    Explain your reasoning.
    Table 1: Baseline characteristics of the study participants
    HIV HIV
    positive negative
    Mean Age (years) 47.3 47.1
    African American (%) 37.3 18.9
    Male (%) 54.0 52.9
    Mean Body mass index (kg/m2) 23.2 27.9
    Intravenous drug use (%) 35.4 4.1
    Mean CD4 lymphocyte count (cells/mm3) 187 1440
    Table 2: Associations of study characteristics with incident CMV infection
    Unadjusted relative risk
    of CMV infection
    HIV disease 4.05
    Age (per 10-year higher) 2.92
    African American (compared to Caucasian) 1.01
    Male (compared to female) 2.05
    Body mass index (per 5 kg/m2 higher) 1.03
    Intravenous drug use (yes versus no) 1.86
    CD4 lymphocyte count (per 100 cells/mm3 increase) 2.70
    2
    (b) The Heart and Estrogen/Progestin study (HERS) was randomized clinical trial of
    hormone replacement therapy in post-menopausal women with existing coronary
    heart disease (CHD)2. We will consider multiple linear regression models fit to
    baseline data collected on the cohort of 2,763 women3. For the purposes of this
    question, you can think of the data as coming from a cross-sectional study.
    i. [3 marks] Consider the fitted multiple linear regression model presented in
    Table 3. The response is LDL cholesterol and the primary exposure or vari-
    able of interest is body mass index (BMI) (a continuous variable measured in
    kg/m2). A set of potential confounders are also included in the model: age,
    ethnicity (nonwhite), smoking, and alcohol use (drinkany). Age is a continu-
    ous explanatory variables and the rest are binary explanatory variables. Give
    a precise written interpretation of the regression parameter for the BMI term.
    Is this result statistically significant?
    ii. [1 mark] Using the model in Table 3 find the predicted LDL cholesterol value
    for a 65 year old woman, who is white, doesn’t smoke but does occasionally
    drink and who has a BMI of 24 kg/m2.
    iii. [2 marks] Now consider the fitted multiple linear regression model presented
    in Table 4. This model includes a binary indicator of statin use (a class of
    drugs used to lower cholesterol levels) and the interaction between this vari-
    able and BMIc. Note that the BMI variable has been centred its mean value
    of 28.6 kg/m2 (i.e. BMIc=BMI-28.6). This makes the parameter estimate for
    statin use more interpretable.
    Using estimates from the fitted model, describe the association between BMI
    (using BMIc) and LDL among statin users and non-users (2-3 sentences). Is
    there evidence that statin use is an e?ect modifier for the association between
    BMI and LDL cholesterol? Explain your reasoning.
    2Hulley, S., Grady, D., Bush, T., Furberg, C., Herrington, D., Riggs, B. and Vittingho?, E. (1998). Randomized
    trial of estrogen plus progestin for secondary prevention of heart disease in postmenopausal women. The Heart and
    Estrogen/progestin Replacement Study. Journal of the American Medical Association, 280(7), 605-613.
    3Vittingho?, E., Glidden, D. V., Shiboski, S. C., & McCulloch, C. E. (2011). Regression methods in biostatistics:
    linear, logistic, survival, and repeated measures models. Springer Science & Business Media.
    3
    Table 3: Fitted multiple linear regression model from HERS study
    MODEL LDL = BMI age nonwhite smoking drinkany
    Parameter Estimates
    Parameter Standard
    Variable DF Estimate Error t Value Pr > |t|
    Intercept 1 147.3153 9.2564 15.91 0.000
    BMI 1 0.3591 0.1341
    age 1 -0.1897 0.1131 -1.68 0.094
    nonwhite 1 5.2194 2.3237 2.25 0.025
    smoking 1 4.7507 2.2104 2.15 0.032
    drinkany 1 -2.7223 1.4989 -1.82 0.069
    Table 4: Fitted multiple linear regression model with interaction from HERS study
    MODEL LDL = statins BMIc statins BMIc age nonwhite smoking drinkany
    Parameter Estimates
    Parameter Standard
    Variable DF Estimate Error t Value Pr > |t|
    Intercept 1 162.4052 7.5833 21.42 0.000
    statins 1 -16.2530 1.4688 -11.07 0.000
    BMIc 1 0.5821 0.1601 3.64 0.000
    statins BMIc 1 -0.7019 0.2694 -2.61 0.009
    age 1 -0.1729 0.1106 -1.56 0.118
    nonwhite 1 4.0728 2.2751 1.79 0.074
    smoking 1 3.1098 2.1670 1.44 0.151
    drinkany 1 -2.0753 1.4666 -1.42 0.157
    4
  3. [10 marks] This question is based on the following paper:
    Bulfone, T. C., Blat, C., Chen, Y. H., Rutherford, G. W., Gutierrez-Mock, L.,
    Nickerson, A., … & Reid, M. J. (2022). Outdoor Activities Associated with
    Lower Odds of SARS-CoV-2 Acquisition: A Case-Control Study. Interna-
    tional Journal of Environmental Research and Public Health, 19(10), 6126.3.
    You can download the paper from https://doi.org/10.3390/ijerph19106126. The
    following questions will lead you through a discussion of the design and a simple unad-
    justed analysis of some of the data from this study.
    (a) [1 marks] In your own words state the goal/purpose of this case-control study.
    (b) [2 marks] Who are the cases in this study and how were they identified/selected?
    Who are the controls and how were they identified/selected?
    (c) [2 marks] Give two inclusion or exclusion criteria used in the selection of the cases
    and controls above.
    (d) [1 marks] What is the primary exposure of interest and how was it assessed?
    (e) [2 marks] Using the data given in Table 2 calculate and interpret the (unmatched,
    unadjusted) Odds Ratio for the primary association of interest in this study.
    (f) [2 marks] Describe at least two potential limitations of this study and/or sources
    of bias or error.
    5
  4. [12 marks] In this question you will explore matching in case-control studies. Consider
    the data in Table 5 giving case counts for a rare disease D and a common exposure E
    in a closed population, stratified by a common binary confounder X. This represents
    the full data in your study population and is normally unobservable.
    Table 5: Hypothetical study population
    X+ X Overall
    E+ E E+ E E+ E
    Cases D+ 80 10 100 200 180 210
    Non-cases D 80,000 20,000 20,000 80,000 100,000 100,000
    Odds Ratio 2.0 2.0 0.86
    Source: Pearce, N. (2016). Analysis of matched case-control studies. BMJ, 352.
    (a) [2 marks] You and your colleagues decide to run an unmatched case-control study
    to investigate the association between E and D. You include all 390 cases from
    your population and a random sample of 390 controls. Recreate Table 5 for this
    study. Use the true sample population prevalences to generate your controls. For
    example, the number of controls with (E+, X+) will be 390 ? P [E+, X + |D].
    (b) [4 marks] Calculate the stratum-specific and unstratified/overall Odds Ratios for
    the data from your unmatched case-control study in (a) and compare them to the
    true population values in Table 5. Supposed you ignored (or were unaware of) X
    and based your analysis on the unstratified case-control data. Test the significance
    of the unstratified Odds Ratio using a 2 test. Be sure to clearly state the null and
    alternative hypotheses, give the formula for the test statistic, calculate its value
    and find the p-value. What is the conclusion of the test? Would your conclusions
    from this study accurately reflect the true association between E and D?
    (c) [2 marks] Now suppose you and your colleagues decide to run a matched case-
    control study. Once again you include all 390 cases and you match based on X.
    Generate stratified and overall matched 2? 2 tables from this study. Assume,
    given X, the exposure statuses of a matched pair are independent and based on
    the true sample population prevalences. For example, for X+ there will be 90
    matched pairs and the number of pairs with both the case and control exposed
    will be 90 ? P [E + |D+, X+]P [E + |D, X+].
    (d) [4 marks] Using the matched 2?2 table from (c) calculate the matched pair Odds
    Ratio and compare it to the true population values in Table 5. Use McNemar’s
    Test to test the significance of the association between E and D. Be sure to clearly
    state the null and alternative hypotheses, give the formula for the test statistic,
    calculate its value and find the p-value. What is the conclusion of the test? Would
    your conclusions from this study accurately reflect the true association between E
    and D?
退出移动版