关于机器学习:MAS61006项目研究与实践

38次阅读

共计 5645 个字符,预计需要花费 15 分钟才能阅读完成。

MAS61006 Assessed Project
This project counts for 40% of the assessment for MAS61006.
1 Aim
The aim of this project is to assess you on the Bayesian modelling via computational methods skills
that you have learned on this module. Exploration and choice of appropriate modelling approach, as
well as how you can disseminate your Bayesian inference to a general audience are key elements of
this assessment.
2 Background
You are a statistician working with a pharmaceutical company who market a birth control drug.
Your involvement with the client is to assist them in better understanding their potential customer
base by investigating a range of demographic variables that may be linked to the uptake of birth
control by a woman. Primary interest is in:
Identifying the key demographic variables that have an effect on birth control use,
Quantifying any such demographic variable effect, and
Predicting the chance of certain demographic groups purchasing birth control in the future (to
know their key marketing groups).
There is a single deliverable for this project, in the form of a written report.
3 Data
The data made available to you by the client are the results of a market research investigation.
This is called birth-control-data.csv, and is available on the course Blackboard page. This data
contains information on 1,934 women regarding the following variables:
birthControl: an binary (0/1) response for whether the subject uses birth control (1 encoded
as use of birth control and 0 as not),
region: a factor variable describing the primary care region that the subject belongs within.
Note that this market research involved 60 care regions, which does not cover the full range of
the company’s target market,
homeStyle: a factor variable indicating whether the subject lives in a rural or urban area (0 is
encoded as rural, and 1 as urban),
children: the number of children the subject has. Note that the average number of children
an individual has in this market research study is 2.65,
age: the age of the subject. Note that this variable has been standardised, so that the average
age in this study is 0,
wealth: the financial wealth of the subject. Note that this variable has been standardised, so
that the average wealth measure in this study is 0.
1
You can import this data to R using the read_csv() function in the usual way.
4 Scope of analysis
The requirements of this analysis are:

  1. Explore the results of the market research data, focusing on the outcome variable birthControl
    with respect to the remaining explanatory variables. Production of appropriate figures/tables
    to summarise this.
  2. Carry out a Bayesian regression-based analysis for the outcome variable birthControl. You
    should include justification as to your chosen regression model and the form of the linear
    predictor involved in this model. You may implement‘improper’/uninformative priors for your
    parameters of interest. Your Bayesian inference should be implemented in Stan, and you are
    expected to write your Stan model yourself (i.e. you should not use the brms package).
  3. Check the convergence diagnostics of your inference approach.
  4. Present your inference findings (potentially in graphical or tabular form), and disseminate this
    information to a general reader. The primary focus of the client is quantifying the effects of
    the demographic variables on the use of birth control.
    To obtain a distinction your analysis should include (in addition to the above):
  5. A‘proper’prior distribution, along with a brief check of the effect of this prior information on
    your posterior inferences. Note that you should not do two separate regression analyses: do a
    single regression analysis with your chosen prior.
  6. Posterior predictions of the birth control use by an average woman who lives in an urban
    setting but an unknown care region.
  7. Out of the 60 primary care regions included in the marketing study, the two with the largest
    population sizes are regions 1 and 14. The marketing department are interested in the which
    of these two largest regions would be best to expend their marketing efforts in over the future.
    Compare the posterior predictions of the birth control use by an average woman who lives in
    an urban setting in each of the regions 1 and 14.
  8. The written report
    Your report must be prepared using R Markdown, using the template (template.Rmd) provided.
    Do not modify the YAML, apart from inserting your own registration number in the author field.
    There is a page limit of 6 pages for requirements 1-4 above, and 8 pages if you are also completing
    requirements 5-7. There is no need for a title page or table of contents. These page limits include
    everything.
    Your report should not contain any R code, but you should submit your .Rmd file alongside your
    PDF report. You should write your report with an intended audience of a client with the same
    knowledge as another student on this course. You therefore can assume a working knowledge of
    regression models, but should explain your results clearly. The grading of your project will give
    equal weighting to:
    The presentation/communication within your report, and
    The technical content of your analysis.
    The body of the report should be structured with the following sections:
    2
  9. Introduction
    Begin with a short summary, including background and the objectives of the investigation.
    Outline briefly the structure of the remaining report.
  10. Methods (split into subsections if appropriate)
    In this section you should include a short exploratory data analysis of the market research
    data.
    Plots and tables should be presented to a high standard, with properly labelled axes,
    suitable sizing, and captions that include a conclusion.
    This exploration should conclude with your chosen modelling approach (and justification
    for such, given your findings).
  11. Results (split into subsections if appropriate)
    In this section you should present your findings.
    Plots and tables should be presented to a high standard, with properly labelled axes,
    suitable sizing, and captions that include a conclusion.
  12. Conclusions and discussion
    State your conclusions, and explain how they are justified based on your methods and
    results.
  13. Unfair Means
    Your project submission must be entirely your own work: do not discuss your project with anyone
    apart from staff teaching this module. If you haven’t already done so, you should work through the
    tutorial on unfair means available on the MSc Statistics organisation page on Blackboard.
  14. Submitting your work
    Upload both your pdf and Rmd file using the Assignment Dropbox on Blackboard. Use the file
    names
    MAS61006ProjectReportxxxxxxxxxx.pdf
    MAS61000ProjectCodexxxxxxxxx.Rmd
    replacing xxxxxxxxx with your student registration number.
正文完
 0