关于程序员:STAT4620-难点分析

9次阅读

共计 2130 个字符,预计需要花费 6 分钟才能阅读完成。

STAT4620/5620 WINTER 2023
Assignment 3: Due Thursday March 2 2023

  1. Suppose that you are interested in studying intravenous drug use among high
    school students in Canada. Drug use is characterized as a binary random variable,
    where 1 indicates that an individual has injected drugs within the past year and
  2. that he/she has not. Covariate information related to drug use includes: infor-
    mation about drug use provided in school (y/n), age of student (years), employed
    part-time (y/n), school connectedness (Likert scale), and gender (m/f).
    (a) [3pts] Propose and defend a suitable model for the aforementioned data. Be
    sure to write down the model equation.
    (b) [2pts] Discuss any potential interactions that might be worthwhile including in
    your model and provide justification as to why (or why not).
    (c) [1pts] Which R package(s) would you use to fit the above model?
    (d) [2pts] What tools would you use to assess model fit and proceed with variable
    selection?
  3. [10pts] Install the R Package faraway. Consider the esdcomp data that were recorded
    on 44 doctors working in an emergency service at a hospital to study the factors
    affecting the number of complaints received. Build a model for the number of
    complaints received, justify your choices, and report your conclusions. (250 words).
  4. [10pts] The bootstrap is a general tool for assessing uncertainty. Describe the boot-
    strap in general and then use it to investigate a statistic of relevance to the dataset
    you have selected for your project. Take advantage of the functions available in the
    R Package bootstrap and be sure to include your references. (500 words).
  5. [5pts] Cross validation is probably the simplest and most widely used method for
    estimating prediction error. Ideally if we had enough data, we would set aside a
    validation set and use it to assess the performance of our model. Since data are
    sometimes scarce, this may not always be possible. We finesse this problem by
    using K-fold cross-validation. Explain. (150 words).
  6. For the analysis of count (or semicontinuous) data there are models available to
    deal with the common situation where there is an excessive number of zeros.
    (a) [5pts] Discuss the various potential sources of zeros. (150 words).
    (b) [8pts] Describe mixture and two-part models and show how their formulations
    handle different types of zeros. (250 words).
    GUIDELINES FOR SUBMISSION:
    Submit the R markdown file (.RMD), the .csv file containing your datasets, AND the result-
    ing knitted .PDF file to BrightSpace Assignments under Assignment 3.
正文完
 0