共计 9254 个字符,预计需要花费 24 分钟才能阅读完成。
COMP 202 – Foundations of Programming
Assignment 3
McGill University, Summer 2022
Due: Saturday, June 18th, 11:59 pm on MyCourses
Late Penalty: 10% per day and up to 2 late days
Important notice
Write all the following functions in one file named
sentiment analysis [’your student ID’].py
For example: if your student Id is 260700000, your file name should be
sentiment analysis 260700000.py
Make sure that all file names and function names are spelled exactly as described in this
document. Otherwise, a 50% penalty per question will be applied. You may make as many
submissions as you like prior to the deadline, but we will only grade your final submission
(all prior ones are automatically deleted).
Sentiment Analysis
Sentiment analysis is one of the challenges related to natural language processing. It is the
task of identifying if the sentiment behind a text, a social media message or voice message
is either positive, negative or neutral. It is widely used in different areas such as marketing,
entertainment or healthcare to evaluate the subjective information given by users, customers
or patients.
The basic idea is to analyse a given text (for example, a social media post) and identify the
sentiment that is behind it. For example:
• I am so happy the weather is amazing! ⇒ Positive sentiment
• This movie was the worst movie ever. ⇒ Negative sentiment
• This is food. ⇒ Neutral sentiment
1
There are many natural language processing libraries and a variety of algorithms in the
literature that are used for sentiment analysis using machine learning algorithms or a rule
based approach.
The objective of this assignment is to build a rule-based approach to identify the sentiment
for a given text. In the following function descriptions:
• Please read the entire A3 guidelines and this PDF before starting.
• You must do this assignment individually.
• The following assignment include two files a pickle file “sentiment dict.pkl” and
a text file named posts.txt
• For both fruitful and void functions you should provide 3 examples in the docstrings
(make sure also to have an example for functions that raise an exception)
2
Questions
- is list char(char list)[8 points]:
• Input parameters:
– char list (a list of characters)
• Output parameter: The function returns True if the input list contains only a list
of single characters
• Description: The function traverse the list and check if any of the characters is
either not a string or not a string of length 1. Do not use the find function.
• Examples:
– if char list = [’A’,’beb’,’f’] the function will return False
– if char list = [’A’,’1’,’f’] the function will return True
– if char list = [’A’,1,’f’] the function will return False - character position(text,char list)[8 points]:
• Input parameters:
– text: string
– char list (a list of characters)
• Output parameter: position a list of positions where the characters where
found in the text
• Description: The function starts by checking if the input char list is a valid
list of single characters by calling is list char function. If the list is not valid,
the function will raise a TypeError exception with the following message:”The
input list should contain only characters”. Otherwise, The function traverse the
text to find all the characters indicated in the list and returns their positions
(positive indices). If none of the characters where found, the function returns an
empty list.
• Note: Do not use the find or index functions.
• Examples:
– if text =”Hello. Is it me you’re looking for?”and char list = [’’,’.’],
then the function will return the list of positions [5,34]
– if text =”Hello. Is it me you’re looking for?”and char list = [’H’,’i’],
then the function will return the list of positions [0,10,27]
– if text =”Hello. Is it me you’re looking for?”and char list = [’Hello’,’i’],
then the function will raise a TypeError exception
3
• NOTE: Since the quotation mark is a special character in a string, if you want
to test the following examples, you should right the previous text as follows:
text=’Hello. Is is me you\’re looking for?’(The backslash character”\”will
indicate that the quotation mark is part of the text and not delimiting the end of
the string. - remove characters(text,char list)[10 points]:
• Input parameters:
– text: string
– char list (a list of characters)
• Output parameter: filtered text a string where the characters in char list
have been removed
• Description: The function starts by checking if the input char list is empty
then it will return the original text. Otherwise, the function calls character
position to find all the characters in char list in the input text. It
will then construct a new string where the char list characters have been
removed.
• Examples:
– if text =”Hello. Is it me you’re looking for?”and char list = [’.’,’?’],
then the function will return the new string”Hello Is is me you’re looking
for”
– if text =”Hello. Is it me you’re looking for?”and char list = [’H’,’i’],
then the function will return the new string”ello. Is t me you’re lookng for?”
• NOTE: Do not convert the text into a list and do not use replace or translate
function - count words category(text,word list)[10 points]:
• Input parameters:
– text: a string
– word list: a list of words
• Output parameter: nb occurrences: the number of times all the words in the
list of words appear in the input text
• Description: The function computes the number of times all the words in the list
of words appear in the input text. You may use the count method here.
• Examples:
– if text =’I love this movie! It is great and the adventure scenes are fun. I
highly recommend it! It is really great!’and
word list = [’great’,’love’,’recommend’,’laugh’,’happy’,’brilliant’] then the
4
function will return the value 4 because the words’love’,’recommend’appear
once and the word’great’appears twice.
– if text =’My pizza was awful and cold. This is really a bad place!’and
word list = [’terrible’,’awful’,’hideous’,’sad’,’cry’,’bad’] then the function
will return the value 2 because both the words’awful’and’bad’appear in
the text. - count number words(text)[5 points]:
• Input parameters:
– text (string)
• Output parameter:number words
• Description: The function will count the number of words in the input string. It
will call character position to find the positions of the the space character.
Do not convert the input text into a list.
• Examples:
– if text =’I love this movie!It is great and the adventure scenes are fun. I
highly recommend it! But the theatre was terrible and there was an awful
smell’. Then the function will return the value 28 (there are 28 words in the
sentence)
– if text =’Hello! How are you?’. Then the function will return 4. - term frequencies(text,dictionary word)[15 points]:
• Input parameters:
– text : string
– dictionary word: A dictionary that has a list of words in each sentiment
category
• Output parameter: dict frequency: A dictionary that contains the frequency
of the words in each category appearing in the text.
• Description: The function starts by calling the number of words in the text using
count number words. Then for each key in dictionary word, it will call
count words category, the output dictionary will have the frequency of the
number of words per category divided by the total number of words in the text
and rounded to 2 decimals.
• Examples: If we have the following text =’I love this movie! It is great and the
adventure scenes are fun. I highly recommend it! But the theatre was terrible
and there was an awful smell’and the dictionary is:
{’POSITIVE’:[’great’,’love’,’recommend’,’laugh’,’happy’,’brilliant’],
’NEGATIVE’:[’terrible’,’awful’,’hideous’,’sad’,’cry’,’bad’],
5
’NEUTRAL’:[’meh’,indifferent’,’ignore’]}
The function will compute 28 words in total with 3 positive words and 2 negative
and 0 neutral words. Divided by the total number of words, the function will
return the following dictionary:
{’POSITIVE’: 0.11,’NEGATIVE’: 0.07,’NEUTRAL’: 0.0} - compute polarity(dict frequency)[15 points]:
• Input parameters:
– dict frequency: The dictionary of frequencies as computed by
term frequencies function.
• Output parameter: polarity (string) of the most prominent sentiment observed
with the given frequencies
• Description: The function will traverse all the dictionary keys and will return
the one that has the highest frequencies. You should not use the built-in max
function.
• Examples:
– If we have the following dictionary {’POSITIVE’: 0.11,’NEGATIVE’: 0.07,
’NEUTRAL’: 0.0}, then the function will return the string’POSITIVE’since
it has the maximum corresponding frequency
– If we have the following dictionary {’POSITIVE’: 0.11,’NEGATIVE’: 0.07,
’NEUTRAL’: 0.5}, then the function will return the string’NEUTRAL’
– If we have the following dictionary {’POSITIVE’: 0.07,’NEGATIVE’: 0.07,
’NEUTRAL’: 0.01}, then the function will return the string’POSITIVE’the
first maximum value foud. - read text(text path)[10 points]:
• Input parameters:
– text path: A string having the file name.
• Output parameter: text list: A list of strings
• Description: The function will open and read the text file located in the given
path. It will read the text line by line. Each line in the text follows the structure:
user pseudonym, comment separated by‘,’the function will ignore the user
pseudonym and add only the comment into the text list. Each element in the
list is one comment from one line of the text. You should not use the readline or
readlines functions.
• Examples: If we have a text file named’text.txt’with the following content :
6
user1,Hello
user2,How are you today?
user3,Good I hope!
user4,Ok take care!
Then the function will return the list of strings:
[’Hello \n’,’How are you today? \n’,’Good I hope! \n’,’Ok take care! \n’] - read pickle[5 points]:
• Input parameters:
– a pickle file
• Output parameter: word dict: the content of the pickle file
• Description: The function will load an object within any pickle file and return
the object
• Examples:
– If the function reads the provided pickle file “sentiment dict.pkl”, it
will return the following dictionary:
{’POSITIVE’: [’great’,’love’,’recommend’,’laugh’,’happy’,’brilliant’],
’NEGATIVE’: [’terrible’,’awful’,’hideous’,’sad’,’cry’,’bad’],
’NEUTRAL’: [’meh’,’indifferent’,’ignore’]} - analyse text(text path,dict path)[14 points]:
• Input parameters:
– text path: The path to a given text file
– dict path: The path to the given dictionary saved in a pickle file
• Output parameter:list polarity a list of computed polarity for each line in
the text file
• Description: The function read the text file and the pickle file. For each line
in the text, the function will: put the text in lower case, remove leading and
trailing whitespaces, remove the stop words from the text using this list of stop
words [’!’,’.’,’?’,’;’,’\n’], compute the term frequencies and the polarity and add
the computed polarity value to list polarity
• Examples:
– If we test the function with the provided text and pickle file it will return the
following list: