Advanced Natural Language Engineering (G5114):
Assessed coursework
February 21, 2022
Format Submit a single zip file containing at least 1 pdf and an appendix of your code (which may be a
.ipynb or a .py file)
Word Count 8 pages (approx. 3000 words) plus code appendix
Marking You will be told your mark and receive feedback via Canvas before Friday 20tt May
Weighting This assignment is worth 60% of your mark for this module.
1 Practical assignment (3000 words)
The Microsoft Research Sentence Completion Challenge (Zweig and Burges, 2011) requires a system to
be able to predict which is the most likely word (from a set of 5 possibilities) to complete a sentence. In
the labs you have evaluated using unigram and bigram models. In this assignment you are expected to
investigate at least 2 extensions or alternative approaches to making predictions. Your solution does
not need to be novel. You might choose to investigate 2 of the following approaches or 1 of the following
approaches and 1 of your own devising.
• Tri-gram (or even quadrigram) models
• Word similarity methods e.g., using Googlenews vectors or WordNet?
• Combining n-gram methods with word similarity methods e.g., distributional smoothing?
• Using a neural language model?
It does not matter how well your method(s) perform. However, your methods should be clearly
described, any hyper-parameters (either fixed, varied or optimised) should be discussed and there should
be a clear comparison of the approaches with each other and the unigram and bigram baselines – both
from a practical and empirical perspective.
You have been provided with the training and test data for this task in the labs. You may (and
are expected to) use any of the code that you have developed throughout the labs. This includes code
provided to you in the exercises or solutions. You may use any other resources to which you have access.
You are encouraged to make use of one or more of WordNet, the Lin dependency thesaurus provided in
NLTK and/or Word2Vec word embeddings. You may also download other resources from the Internet
and make use of any Python libraries that you are familiar with.
Your report should be in the style of an academic paper. It should include an introduction to the
problem and the methods you have implemented. It could include a brief discussion of related work
in the area but the focus of the report must be your own practical work and you are not expected to
carry out a comprehensive literature review. You should discuss the hyper-parameter settings – both
those which you have decided to fix and any which you are investigating. You should discuss and justify
the method of evaluation. You should provide your results and compare them with the unigram and
bigram baselines. You should also provide some analysis of errors – do the approaches make the same or
different mistakes and can you comment on the types or causes of errors being made? You should end
with your conclusions and areas for further work. You should also submit your code as an appendix.
Your report (including figures and bibliography but not including code appendix) should be no longer
than 8 sides (3000 words of text plus figures and bibliography). Your code in the appendix should be
clearly commented.
Marks will not be awarded simply for how well your system does or for programming wizardry. Marks
will be awarded for clearly evaluating possible solutions to the sentence completion challenge.
2 Marking Criteria and Requirements
Table 1 shows the number of marks available for each requirement (Total = 60).
Requirement Max mark Interpretation
problem outline 7 Does the introduction explain the task and the motivation for finding
methods which do well at this task?
method 10 Is there a clear description of the proposed methods for tackling the
task? Do the proposed methods seem sensible? Novel or more interesting
methods may score highly here (if well-described) but methods
will not necessarily gain more marks simply by being more ambitious.
hyper-parameter
settings
5 Within each proposed method, are there any hyper-parameter settings
which are being fixed or explored? Are these clearly explained?
evaluation 10 Is the method of evaluation stated, explained and justified? Are
results clearly presented (in a table and/or a graph!)?
analysis 10 Is there an analysis of errors of the methods? Are there particular
types of question which one or both methods do badly at?
conclusion 3 Is there a sensible conclusion?
further work 5 Are there sensible suggestions for further work to do in this area.
These might include improvements to the method, other methods or
other applications of the method.
academic style 5 Is the report written in the style of a research paper? Are major
points backed up with references? Is the report well-written and
well-structured?
code appendix 5 Is the code in the appendix clear and correct?
Table 1: Breakdown of marks
For each requirement, the following scale will be used when deciding the number of marks awarded.
85%-100% Outstanding. Demonstrates a thorough understanding and appreciation of the material without
significant error or omission; evidence of extra study or creative thought
70%-84% Excellent. Demonstrates a thorough understanding and appreciation of the material producing work
without significant error or omission
60%-69% Very good. Clear understanding demonstrated, substantially complete and correct. There may be
minor gaps in knowledge/understanding. Evidence of independent thought
50%-59% Reasonable knowledge and understanding of basic issues demonstrated.
45%-49% Basic knowledge and understanding demonstrated with some appreciation of the issues involved.
Gaps in knowledge and understanding; confusion over more complex material.
40%-44% Significant issues neglected with little or no appreciation of the complexity of the problem.
20%-39% Some correct or relevant material but significant issues neglected / sig. errors or misconceptions
0%-19% Very little or nothing that is correct and relevant
References
Geoffrey Zweig and Christopher Burges. 2011. The microsoft research sentence completion challenge. Technical
report, Microsoft Research, December.