共计 5058 个字符,预计需要花费 13 分钟才能阅读完成。
CMPT 413/713 Natural language processing
Homework 4, Fall 2021
Due: November 9th, 2021
Instructions Answers are to be submitted by 11:59pm of the due date as a PDF file through Canvas. Go to
Canvas, select HW3-C activity and submit your answer as answer.pdf. This assignment is to be done individually.
Please make sure that your name and student id is clearly visible.
Collaboration Policy You may form study groups and discuss the problems in the homework with others, but
each person must write up your answers from scratch by yourself. Copying is not permitted and will be treated
as a violation of the SFU honour code and academic integrity. If you do discuss the homework with others, please
indicate who you have discussed the homework with in your writeup.
- Analyzing NMT Errors (12 points)
Here we present a series of errors we found in the outputs 1 of our NMT model (which is what you will implement
in HW4-P). For each example of a Spanish source sentence, reference (i.e.,‘gold’) English translation, and NMT
(i.e.,‘model’) English translation, please: - Identify the error in the provided NMT translation.
- Provide possible reason(s) why the model may have made the error (either due to a specific linguistic
construct or a specific model limitation). - Describe one possible way we might alter the NMT system to fix the observed error. There are more than
one possible fixes for an error. For example, it could be tweaking the size of the hidden layers or changing
the attention mechanism.
Below are the translations that you should analyze as described above. Note that out-of-vocabulary words are
underlined. Rest assured that you don’t need to know Spanish to answer these questions. You just need to
know English! The Spanish words in these questions are similar enough to English that you can mostly see the
alignments. If you are uncertain about some words, please feel free to use resources like Google Translate to
look them up.
(a) (2 points) Source Sentence: Aqu′? otro de mis favoritos,“La noche estrellada”.
Reference Translation: So another one of my favorites,“The Starry Night”.
NMT Translation: Here’s another favorite of my favorites,“The Starry Night”.
(b) (2 points) Source Sentence: Eso es ma′s de 100,000 hecta′reas.
Reference Translation: That’s more than 250 thousand acres.
NMT Translation: That’s over 100,000 acres.
(c) (2 points) Source Sentence: Un amigo me hizo eso – Richard Bolingbroke.
Reference Translation: A friend of mine did that – Richard Bolingbroke.
NMT Translation: A friend of mine did that – Richard
(d) (2 points) Source Sentence: Solo tienes que dar vuelta a la manzana para verlo como una epifan′?a.
Reference Translation: You’ve just got to go around the block to see it as an epiphany.
NMT Translation: You just have to go back to the apple to see it as an epiphany.
1The data is from TED talks.
1
(e) (2 points) Source Sentence: Ustedes saben que lo que yo hago es escribir para los nin?os, y, de hecho,
probablemente soy el autor para nin?os, ms ledo en los EEUU.
Reference Translation: You know, what I do is write for children, and I’m probably America’s most
widely read children’s author, in fact.
NMT Translation: You know what I do is write for children, and in fact, I’m probably the author for
children, more reading in the U.S.
(f) (2 points) Source Sentence: Ella salvo′ mi vida al permitirme entrar al ban?o de la sala de profesores.
Reference Translation: She saved my life by letting me go to the bathroom in the teachers’lounge.
NMT Translation: She saved my life by letting me go to the bathroom in the women’s room. - Contextualized Word Embeddings (14 points)
(a) (2 points) Some word embeddings, such as ELMo2, use character level embedding rather than word level
embedding. What might be an advantage of using character level embedding over word level embedding.
(b) (3 points) Another word embedding called BERT3 uses masked language modeling as its pre-training
objective. Explain what this objective is. Also mention an advantage of using this objective in some
downstream natural language tasks.
(c) (3 points) BERT is built on top of transformer encoders which uses self-attention over the words. Explain
what are the three components that goes into input embedding of the BERT model. Explain how this
input embedding is used to create key, value, query vectors that are used for self-attention.
(d) (4 points) Your friend wants to build an entity linking system using BERT. In an entity linking system,
named entity references in the text are linked to their corresponding entities in a knowledge base.
For example, consider the following text:
“Vancouver is one of the most ethnically and linguistically diverse cities in Canada…”
Then the named entity Vancouver can be linked to the corresponding page for the city Vancouver from
DBpedia4 http: // dbpedia. org/ page/ Vancouver
Assume that you have a training dataset with sentences and spans of tokens linked to entities in DBpedia.
You want to help your friend build the entity linking system. To start, you will need to design a model
that, given text as input, will need to identify what spans need to be linked and what the target entities
should be for each span.
Explain how you would setup an entity linking model using BERT. During training, what input would you
provide to the network? What would the output of the network be? How would you train the network
(what parameters will be learned or fine-tuned, and what would be the training objective at a high level)?
(e) (2 points) What are some advantages of the transformer model over RNNs? List at least two. - Constituency Parsing (12 points)
Consider the following treebank consisting of three sentences and their parse trees. The part of speech tags are
D (determiner), Adj (adjective), N (noun), V (verb), P (preposition), C (conjunction), Pro (pronoun) and the
phrases are NP (noun phrase), VP (verb phrase), PP (prepositional phrase) and S (sentence). - I ordered fried chicken and coke
- It rains this time of the year
- The little kids are playing violin at the concert