乐趣区

关于算法:IN3063INM702-Mathematics

IN3063/INM702: Mathematics and Programming for AI
Coursework
Submission deadlines:
Report and Code: Sunday 2nd January 2022, 5pm
Presentation: Wednesday 19th January 2022, 5pm
Introduction

This coursework builds on the material covered in the lecture slides, the classroom
presentations, and the tutorial Jupyter notebooks used in the labs. On completing this
coursework, you should be able to code your analysis in Python, to implement and understand
regression methods and classification techniques, as well as to implement advanced neural
network techniques from scratch. You will make use of the different concepts learned in the
module:

How to convert mathematical principles into algorithms
How to implement those algorithms in Python
How to organize your ideas in an appropriate code structure
How to evaluate different algorithms

Python should be used for all implementations. Deliverables are:

Written reports of your work.
Your practical implementation (code), with the appropriate comments.
For INM702 only: an individual oral presentation (15 minutes)

Module marking:

INM702: 70% Coursework (Code and Report) and 30% Presentation.
IN3063: 100% Coursework

See the Appendix for details on grade-related criteria.

2
Teamwork

This coursework should be completed either in groups of two or individually (at least Task
1 individually). We encourage you to work in pairs, and no additional marks will be granted if
you do the Coursework alone. If you decide to work in pair, you should declare it on the report.

All team members are expected to contribute to all parts of the work: both the coding and the
report. Teamwork does NOT mean division of labour. You can distribute the leading role for
each assignment, but each of you must contribute to all the tasks. If you don’t, you will not be
evaluated for the tasks that you did not contribute to. Distributing the assignments is
considered a form of academic misconduct.
You are required to explain your personal contribution to each task in the coursework report,
in the reflection part.
For MSc AI students, there is a maximum number of modules you can work within the same
team. You cannot operate with the same team in more than 2 modules per term and no more
than 4 modules in total.
The Coursework is divided into 4 different Tasks. Even if you are in a team, Task 1 should be
solved individually, and no teamwork is allowed for the Task 1.

Submission

Submission is through Moodle (https://moodle.city.ac.uk), and no other method of submission
will be accepted. You should submit the following files:

Report number 1: Task 1 (Individual), (maximum 3 pages. Any extra space is allowed
for citations).

Report number 2: Task 2, including a description of the work, your analysis, and a
reflection on the work indicating sources and personal contributions, if working in pairs.
(3 pages + 1 extra page containing additional space for supplemental figures. Any
extra space is allowed for citations).

Report number 3: Task 3-4, including a description of the work, your analysis, and a
reflection on the work indicating sources and personal contributions, if working in pairs.
(maximum 6 pages + 2 extra pages containing additional space for supplemental
figures. Any extra space is allowed for citations).

Zip file of the Code for Task 1 (properly commented and with references to code
sources, if any).

Zip file of the Code for Task 2 (properly commented and with references to code
sources, if any).

Zip file of the Codes for Task 3-4 (properly commented and with references to code
sources, if any).

Code files should comprise: a Jupyter notebook, with executable cells showing results
and with markdown cells explaining the steps of the analysis or of the modelling
process. Random number generators with fixed seeds should be included to ensure
the reproducibility of the results as described in your report. However, note that your
3
results should be robust to changes of the seeds. Additionally, corresponding Python
scripts should be included, collecting all the parts of your code.

In addition:
Your code must be developed and be available on a git server (github), with a full
revision history indicating who has created what code. This repository should be
available to the Lecturers of the module, Atif Riaz and Daniel Chicharro. You should
provide the link of the git repository in your report.

Format for reports: pdf format, single column, standard A4 margins, standard default line
spacing of 1.15, font Arial 11, including all figures.

Late submissions will score 0. You can upload work to Moodle more than once, so there is no
need for last minute submission. The submission period will be opened December 20. Don’t
leave submission to the last minute, make sure to submit something and then revise it.

Presentation (INM 702 only)

You will be evaluated in an oral presentation (15 minutes). During this presentation, you will
present the results for the 4 tasks and answer questions from the Lecturers.

This is an individual exercise.

Oral presentations will take place January 19-21. A timetable with your personal spot will be
released in advanced.

Feedback

In the labs we can check your progress and give formative feedback. Evaluative feedback and
marks on your coursework will be given out after the submissions. Drop-in hours are available
at the Moodle site for additional feedback and questions.

Datasets

Fashion-MNIST (https://www.kaggle.com/zaland…) – a dataset
of online retailer Zalando’s article images. It is very similar in flavour to MNIST, but
instead of handwritten digits it contains fashion/clothing items (for an overview see
https://github.com/zalandores…). Fashion-MNIST consists of a
training set of 60,000 examples and a test set of 10,000 examples. Each example is a
28×28 greyscale image. Each image is associated with a label from 10 different
classes representing the type of clothing item (e.g., 0: Tshirt/Top, 1: Trouser, 2:
Pullover, etc.). Both training and test sets have 785 columns: the first column consists
of the class labels (0-9); the rest of the 784 columns contain the pixel values of the
associated image. You can use any library or API to load the data (like pytorch). You
will need this dataset for Task 3.

CIFAR-10 (https://www.cs.toronto.edu/~k…) The CIFAR-10 dataset
consists of 60000 32×32 colour images in 10 classes, with 6000 images per class.
There are 50000 training images and 10000 test images. The dataset is divided into
4
five training batches and one test batch, each with 10000 images. The test batch
contains exactly 1000 randomly-selected images from each class. The training
batches contain the remaining images in random order, but some training batches may
contain more images from one class than another. Between them, the training batches
contain exactly 5000 images from each class. You will need this dataset for Task 4.

Coding
As indicated above, each task should be presented as an individual Jupyter Notebook
with a corresponding Python script and possibly additional modules and packages that
you developed and are used by the notebooks.
Code quality, clarity, organization, and comments will be taken into account in the marking.

The Tasks

In this coursework, you are expected to demonstrate what you have learned in the module
in terms of Programming, Regression methods, Neural Networks, and Deep Learning.
The maximum number of marks which can be scored is 100. Each Task weights 25 points.
In all tasks, you can use the built-in libraries of python (math, random, …), numpy, and
matplotlib. If you think that you might benefit from using another library, you can ask about it
to the Lecturers.
You will use PyTorch in Task 4, and you are allowed to use any library in Task 4.
Note that you can use any library for the purpose of loading the training and testing dataset of
the Fashion-MNIST for Task 3.

Task 1: 25 marks
The first task tests your Python skills and capacity to plan a statistical analysis. You need to
develop a simple game consisting of a rectangular grid (of size height x width) where each
cell has a random value between 0 and n. An agent starts at the upper-left corner of the grid
and must reach the lower-right corner of the grid as fast as possible. Accordingly, the task
consists on finding the shortest path.
5

There are two game modes:
The time spent on a cell is the number on this cell
The time spent on a cell is the absolute of the difference between the previous cell the
agent was on and the current cell it is on
-The task is divided in the following parts:
-Implementation of the game. Implement the game in a structured and flexible way to allow
the selection of game modes and parameters. Build a method to build and visualize the grid
filled with random numbers. Build a method to visualise a path. (30%)
-Develop your own heuristic algorithm. Identify simple criteria and strategies to find short
paths. This algorithm should be taken as a baseline. It does not have to be optimized to
perform fast or well, but should be better than random movements. Please implement this part
without searching for standard algorithms to find short paths. (10%)
-Implement the Dijkstra’s algorithm to find the shortest path between two points. There are
many refined versions of this algorithm that you can find in the literature. Implement a simple
version close to the original algorithm, using a simple priority queue. Write your own code as
much as possible and provide detailed comments of each step. Relying on more sophisticated
implementations available online is not the objective of the task, but to be able to write your
own code. (30%)
-Plan and implement a statistical analysis to characterize the length of the shortest path in
dependence of several parameters of the grid, and comparing the two game modes. Relevant
parameters are: size of the grid, distribution from which cell numbers are generated, etc. (30%)
A detailed exposition of the Task and parameters will be presented in week 2.

Task 2: 25 Marks
Study several factors that affect the performance and interpretation of a simple model such as
Linear Regression analysis. The factors that will be discussed include model mismatch, the
presence of outliers in the data, the presence of hidden confounders or of selection bias, and
the presence of correlations between the covariates (multicollinearity). You will characterize
how these factors affect performance and the interpretation of the parameters in the model.
You will examine different solutions to eliminate or mitigate these effects (e.g. normalization
or transformation of the covariates). You should choose 2 of these factors to be studied in
6
your work. Each will contribute evenly to your mark (50%). A detailed exposition of the task
and parameters will be presented in week 4.

Task 3: 25 Marks

The third task is about classifying Fashion-MNIST dataset. A short description of the dataset
is provided in the Datasets section above.
You can use other API’s/libraries for loading the dataset, but not for the neural network
implementation. The point of this task is to develop a multi-layer neural network for
classification using mostly the numpy.

  • Implement sigmoid and Relu layers (with forward and backward pass) (10%)
  • Implement a softmax output layer (10%)
  • Implement a fully parameterizable neural network (number and types of
    layers, number of units can be changed)
    (20%)
  • Implement an optimizer (e.g. SGD or Adam) and a stopping criterion of your
    choosing
    (10%)
  • Train your Neural Network using backpropagation (20%)
  • Evaluate different neural network architectures/parameters, present and
    discuss your results
    (30%)

Task 4: 25 Marks
The fourth task is about implementing a neural network using PyTorch. You will use
CIFAR-10 dataset for this task.

  • Implement a neural network (20%)
  • Propose improvements (eg. Convolutional Neural Network, dropout, etc) (30%)
  • Evaluate different parameters (20%)
  • Present and discuss the results in the report (30%)
    For INM702 Only: if you want to use some other dataset instead of CIFAR-10, you can do so.
    But first please check with lecturers.

Reports

Each report must have an additional first title page (not included in the page count), and as
many references as needed (not counting in the page total). Graphical illustration of your
results is expected as well as numerical results and analysis.

You should present the results clearly and provide a discussion of the results, with conclusions
related to the problems being addressed. The conclusions section might discuss as well some
further work based on the results of this coursework.

Of particular importance, you should indicate on the report an estimate of the percentage of
code that you borrowed from external sources for each task, and cite them properly. This will
matter for the evaluation of your work. Failure to do so will lead to a fail mark.

Reflection
In the case of teamwork, the reflection part should address who did what.

Note

You are not only being marked on how good the results are. It matters that you try something
sensible and clearly describe the problem, methods, what you did, and your interpretation of
the results.

Coding & Referencing

This is, in large part, a coding assignment. If you use code (or other materials) written by
someone else, you must cite that code (or other material). If you do not cite work appropriately
you will have committed academic misconduct. Making superficial changes to the code does
not make it yours. You are also expected to make a coding contribution, so if you use a large
amount of code written by someone else, and cite it appropriately, your coding contribution
will be low, and your work marked accordingly in this respect.

Extenuating Circumstances

If you are not able to submit your coursework on time for unforeseen medical reasons or
personal reasons beyond your control you should contact the Programmes Office as soon as
possible and fill an Extenuating Circumstances form. Strong evidence in the form of, for
instance, medical certificates or legal statements will have to be produced.

https://studenthub.city.ac.uk…
appeals

Plagiarism

If you copy the work of others (either that of another team or of a third party), with or without
their permission, you will score no marks and further disciplinary action will be taken against
you. The same applies if you allow others to copy your work.

See more at

https://studenthub.city.ac.uk…
avoiding-plagiarism_FINAL.pdf

and see

https://www.citystudents.co.u…
mic-Misconduct-Policy-and-Guidance-1920.pdf

for general guidance on academic misconduct guidance.

退出移动版