乐趣区

关于程序员:ACS61013-Assignmen

Module title: ACS61013 Assignment Name: Coursework 2

Assignment released: 23rd of Nov 2021 Assignment hand in: 17th of Dec 2021

Assignment due date: Hand in by 11pm on the 17th of December; this course work makes up 60% of
your total module mark. Submit your report on Blackboard as a pdf file on Blackboard. Also include
your orange file (.ows) and your MATLAB (or Python) codes as part of your submission.
Unfair Means: The assignment should be completed individually. You should not discuss the
assignment with other students and should not work together in completing the assignment. The
assignment must be wholly your own work. Any suspicions of the use of unfair means will be
investigated and may lead to penalties. See http://www.shef.ac.uk/ssid/ex… for more
information.
Penalties for Late Submission: Late submissions will incur the usual penalties of a 5% reduction in
the mark for every working day (or part thereof) that the assignment is late and a mark of zero for
submission more than 5 working days late.
Extenuating Circumstances: If you have any extenuating circumstances (medical or special
circumstances) that might have affected your performance on the assignment, please follow the
guidance at https://www.sheffield.ac.uk/s…
Help: This assignment briefing and the lecture notes provide all the information that is required to
complete this assignment. It is not expected that you should need to ask further questions. However,
if you need clarifications on the assignment then please discuss the issue with me after a lab.

Specific assignment information and instructions
The challenge: You have been approached by the ministry of buildings about a proposal to build a
new airport. They want the airport to provide the highest level of customer satisfaction. You been
provided a data set made up of features that customers look for in an airport and level of their
satisfaction with the airport. The description of the features/columns making up the dataset are self-
explanatory. You can look up further information about them in the Appendix below. The data set
contains 3502 data points and 37 features. Your task is to develop a machine learning model to
predict/estimate customer satisfaction based on airport features.
Tools to use: Majority of the MATLAB code you need to complete the assignment are available from
various lab sessions. If you are comfortable using Python, you are free to use it. You also free to use
Orange for various aspects of the coursework as required.
Tasks and Mark Scheme: The aim of this coursework is to design, implement and evaluate an
effective machine learning pipeline for predicting customer satisfaction. The specific tasks and
corresponding mark scheme are given in the table below. It is up to you how you approach this
problem, design a solution and write-up your results. For each task, the mark within the grade
boundary will be based on your description in your report, results and code.

Task/Assessment Description Mark Range Level of
achievement
Conduct a domain analysis and present your findings as related to
the domain of the coursework.

Discuss how what you have found from your domain analysis will
support and be carried over to other parts of your coursework.

0-15% 1
Achieve level 1 as well as conduct data cleaning, pre-processing and
feature engineering.

Discuss how you used your understanding of the domain from level
1 to support this task.

15-25% 2
Achieve the previous levels plus discuss the steps taken in dimension
reduction and preventing bias in the dataset to be used to training the
machine learning algorithms.

Answer the following questions:
Which data features capture the most variability in the dataset
and explain why you think they do so? (Hint: Perform PCA
first, extract the Principal Components (PCs) that capture the
highest variability in the dataset. Then see which features
contribute to the PCs). Highlight the PCs together with the
features that contribute most to them.

Which 5 variables closely correlate with the customer
satisfaction column and using your knowledge of the domain
(Hint: Use your travel experience), explain why you think
they correlate to the customer satisfaction column?

25-40% 3
Achieve all the previous levels as well as explain how:
You decided on the choice of the best two machine learning
algorithms to apply to the problem.

You used orange (or python/MATLAB) to develop an
effective machine learning pipeline from data cleaning up to
the point of evaluation.
40-60% 4
Achieve all the previous levels plus discuss how you applied cross
validation techniques in the machine learning pipeline.

60-70% 5
Achieve all the previous levels as well as discuss how effective your
pipeline is at preventing overfitting and underfitting through the
application of learning curves.

70-80% 6
Achieve all the previous levels and the below:
You can compare your choice of machine learning algorithm
with at least two other algorithms that we have not covered in
class.
80-100% 7
Discuss the mathematical peculiarities of the algorithms you
have chosen (strengths and weaknesses) and how they impact
the results you obtained.

Apply the appropriate metrics to compare the algorithms you
have chosen with the ones we have used in class.

Discuss the effects of model complexity of the chosen
algorithms on the learning curves generated.

Technical Report and code
Write your results in no more than a 15 page technical report. Make sure your report has a table
of content, sections, discussion and conclusion.
You must create a MATLAB code and an orange pipeline design for your solution(s).
Support your report with an orange pipeline design and MATLAB code. Make sure you provide
comments in your MATLAB code as well as instructions on how to run it. Hand in your report
(.pdf), software (Orange and MATLAB) via Blackboard by 11pm on the 17th of December 2021.
This course work makes up 60% of your total module mark.

退出移动版