乐趣区

关于算法:COMP-202基础编程技巧

COMP 202 – Foundations of Programming
Assignment 3
McGill University, Summer 2022
Due: Saturday, June 18th, 11:59 pm on MyCourses
Late Penalty: 10% per day and up to 2 late days
Important notice
Write all the following functions in one file named
sentiment analysis [’your student ID’].py
For example: if your student Id is 260700000, your file name should be
sentiment analysis 260700000.py
Make sure that all file names and function names are spelled exactly as described in this
document. Otherwise, a 50% penalty per question will be applied. You may make as many
submissions as you like prior to the deadline, but we will only grade your final submission
(all prior ones are automatically deleted).
Sentiment Analysis
Sentiment analysis is one of the challenges related to natural language processing. It is the
task of identifying if the sentiment behind a text, a social media message or voice message
is either positive, negative or neutral. It is widely used in different areas such as marketing,
entertainment or healthcare to evaluate the subjective information given by users, customers
or patients.
The basic idea is to analyse a given text (for example, a social media post) and identify the
sentiment that is behind it. For example:
• I am so happy the weather is amazing! ⇒ Positive sentiment
• This movie was the worst movie ever. ⇒ Negative sentiment
• This is food. ⇒ Neutral sentiment
1
There are many natural language processing libraries and a variety of algorithms in the
literature that are used for sentiment analysis using machine learning algorithms or a rule
based approach.
The objective of this assignment is to build a rule-based approach to identify the sentiment
for a given text. In the following function descriptions:
• Please read the entire A3 guidelines and this PDF before starting.
• You must do this assignment individually.
• The following assignment include two files a pickle file “sentiment dict.pkl” and
a text file named posts.txt
• For both fruitful and void functions you should provide 3 examples in the docstrings
(make sure also to have an example for functions that raise an exception)
2
Questions

  1. is list char(char list)[8 points]:
    • Input parameters:
    – char list (a list of characters)
    • Output parameter: The function returns True if the input list contains only a list
    of single characters
    • Description: The function traverse the list and check if any of the characters is
    either not a string or not a string of length 1. Do not use the find function.
    • Examples:
    – if char list = [’A’,’beb’,’f’] the function will return False
    – if char list = [’A’,’1’,’f’] the function will return True
    – if char list = [’A’,1,’f’] the function will return False
  2. character position(text,char list)[8 points]:
    • Input parameters:
    – text: string
    – char list (a list of characters)
    • Output parameter: position a list of positions where the characters where
    found in the text
    • Description: The function starts by checking if the input char list is a valid
    list of single characters by calling is list char function. If the list is not valid,
    the function will raise a TypeError exception with the following message:”The
    input list should contain only characters”. Otherwise, The function traverse the
    text to find all the characters indicated in the list and returns their positions
    (positive indices). If none of the characters where found, the function returns an
    empty list.
    • Note: Do not use the find or index functions.
    • Examples:
    – if text =”Hello. Is it me you’re looking for?”and char list = [’’,’.’],
    then the function will return the list of positions [5,34]
    – if text =”Hello. Is it me you’re looking for?”and char list = [’H’,’i’],
    then the function will return the list of positions [0,10,27]
    – if text =”Hello. Is it me you’re looking for?”and char list = [’Hello’,’i’],
    then the function will raise a TypeError exception
    3
    • NOTE: Since the quotation mark is a special character in a string, if you want
    to test the following examples, you should right the previous text as follows:
    text=’Hello. Is is me you\’re looking for?’(The backslash character”\”will
    indicate that the quotation mark is part of the text and not delimiting the end of
    the string.
  3. remove characters(text,char list)[10 points]:
    • Input parameters:
    – text: string
    – char list (a list of characters)
    • Output parameter: filtered text a string where the characters in char list
    have been removed
    • Description: The function starts by checking if the input char list is empty
    then it will return the original text. Otherwise, the function calls character
    position to find all the characters in char list in the input text. It
    will then construct a new string where the char list characters have been
    removed.
    • Examples:
    – if text =”Hello. Is it me you’re looking for?”and char list = [’.’,’?’],
    then the function will return the new string”Hello Is is me you’re looking
    for”
    – if text =”Hello. Is it me you’re looking for?”and char list = [’H’,’i’],
    then the function will return the new string”ello. Is t me you’re lookng for?”
    • NOTE: Do not convert the text into a list and do not use replace or translate
    function
  4. count words category(text,word list)[10 points]:
    • Input parameters:
    – text: a string
    – word list: a list of words
    • Output parameter: nb occurrences: the number of times all the words in the
    list of words appear in the input text
    • Description: The function computes the number of times all the words in the list
    of words appear in the input text. You may use the count method here.
    • Examples:
    – if text =’I love this movie! It is great and the adventure scenes are fun. I
    highly recommend it! It is really great!’and
    word list = [’great’,’love’,’recommend’,’laugh’,’happy’,’brilliant’] then the
    4
    function will return the value 4 because the words’love’,’recommend’appear
    once and the word’great’appears twice.
    – if text =’My pizza was awful and cold. This is really a bad place!’and
    word list = [’terrible’,’awful’,’hideous’,’sad’,’cry’,’bad’] then the function
    will return the value 2 because both the words’awful’and’bad’appear in
    the text.
  5. count number words(text)[5 points]:
    • Input parameters:
    – text (string)
    • Output parameter:number words
    • Description: The function will count the number of words in the input string. It
    will call character position to find the positions of the the space character.
    Do not convert the input text into a list.
    • Examples:
    – if text =’I love this movie!It is great and the adventure scenes are fun. I
    highly recommend it! But the theatre was terrible and there was an awful
    smell’. Then the function will return the value 28 (there are 28 words in the
    sentence)
    – if text =’Hello! How are you?’. Then the function will return 4.
  6. term frequencies(text,dictionary word)[15 points]:
    • Input parameters:
    – text : string
    – dictionary word: A dictionary that has a list of words in each sentiment
    category
    • Output parameter: dict frequency: A dictionary that contains the frequency
    of the words in each category appearing in the text.
    • Description: The function starts by calling the number of words in the text using
    count number words. Then for each key in dictionary word, it will call
    count words category, the output dictionary will have the frequency of the
    number of words per category divided by the total number of words in the text
    and rounded to 2 decimals.
    • Examples: If we have the following text =’I love this movie! It is great and the
    adventure scenes are fun. I highly recommend it! But the theatre was terrible
    and there was an awful smell’and the dictionary is:
    {’POSITIVE’:[’great’,’love’,’recommend’,’laugh’,’happy’,’brilliant’],
    ’NEGATIVE’:[’terrible’,’awful’,’hideous’,’sad’,’cry’,’bad’],
    5
    ’NEUTRAL’:[’meh’,indifferent’,’ignore’]}
    The function will compute 28 words in total with 3 positive words and 2 negative
    and 0 neutral words. Divided by the total number of words, the function will
    return the following dictionary:
    {’POSITIVE’: 0.11,’NEGATIVE’: 0.07,’NEUTRAL’: 0.0}
  7. compute polarity(dict frequency)[15 points]:
    • Input parameters:
    – dict frequency: The dictionary of frequencies as computed by
    term frequencies function.
    • Output parameter: polarity (string) of the most prominent sentiment observed
    with the given frequencies
    • Description: The function will traverse all the dictionary keys and will return
    the one that has the highest frequencies. You should not use the built-in max
    function.
    • Examples:
    – If we have the following dictionary {’POSITIVE’: 0.11,’NEGATIVE’: 0.07,
    ’NEUTRAL’: 0.0}, then the function will return the string’POSITIVE’since
    it has the maximum corresponding frequency
    – If we have the following dictionary {’POSITIVE’: 0.11,’NEGATIVE’: 0.07,
    ’NEUTRAL’: 0.5}, then the function will return the string’NEUTRAL’
    – If we have the following dictionary {’POSITIVE’: 0.07,’NEGATIVE’: 0.07,
    ’NEUTRAL’: 0.01}, then the function will return the string’POSITIVE’the
    first maximum value foud.
  8. read text(text path)[10 points]:
    • Input parameters:
    – text path: A string having the file name.
    • Output parameter: text list: A list of strings
    • Description: The function will open and read the text file located in the given
    path. It will read the text line by line. Each line in the text follows the structure:
    user pseudonym, comment separated by‘,’the function will ignore the user
    pseudonym and add only the comment into the text list. Each element in the
    list is one comment from one line of the text. You should not use the readline or
    readlines functions.
    • Examples: If we have a text file named’text.txt’with the following content :
    6
    user1,Hello
    user2,How are you today?
    user3,Good I hope!
    user4,Ok take care!
    Then the function will return the list of strings:
    [’Hello \n’,’How are you today? \n’,’Good I hope! \n’,’Ok take care! \n’]
  9. read pickle[5 points]:
    • Input parameters:
    – a pickle file
    • Output parameter: word dict: the content of the pickle file
    • Description: The function will load an object within any pickle file and return
    the object
    • Examples:
    – If the function reads the provided pickle file “sentiment dict.pkl”, it
    will return the following dictionary:
    {’POSITIVE’: [’great’,’love’,’recommend’,’laugh’,’happy’,’brilliant’],
    ’NEGATIVE’: [’terrible’,’awful’,’hideous’,’sad’,’cry’,’bad’],
    ’NEUTRAL’: [’meh’,’indifferent’,’ignore’]}
  10. analyse text(text path,dict path)[14 points]:
    • Input parameters:
    – text path: The path to a given text file
    – dict path: The path to the given dictionary saved in a pickle file
    • Output parameter:list polarity a list of computed polarity for each line in
    the text file
    • Description: The function read the text file and the pickle file. For each line
    in the text, the function will: put the text in lower case, remove leading and
    trailing whitespaces, remove the stop words from the text using this list of stop
    words [’!’,’.’,’?’,’;’,’\n’], compute the term frequencies and the polarity and add
    the computed polarity value to list polarity
    • Examples:
    – If we test the function with the provided text and pickle file it will return the
    following list:
退出移动版