关于机器学习:CSE30-Pollution-Lookup

49次阅读

共计 12509 个字符,预计需要花费 32 分钟才能阅读完成。

Assignment 7: Pollution Lookup
Revisited
CSE30: Computer Organization and Systems Fall 2021
Instructors: Bryan Chin and George Obaido
Due: Monday, March 7th, 2022 @ 11:59PM
Please read over the entire assignment before starting to get a sense of what you will need to
get done in the next week. REMEMBER: Everyone procrastinates but it is important to know
that you are procrastinating and still leave yourself enough time to finish. Start early. You
MUST run the assignment on the pi cluster. You HAVE to SSH: You will not be able to
compile or run the assignment otherwise.
Please read the FAQ and search the existing questions on Edstem before asking for help.
This reduces the load on the teaching staff, and clutter/duplicate questions on Edstem.
Additionally, staff support is not guaranteed to be available after 9pm on Wednesdays.
Table of Contents

  1. Table of Contents
  2. Choose your Teammate
  3. Learning Goals
  4. Assignment Overview
  5. Getting Started
    a. Developing Your Program
    b. Running Your Program
  6. How the Program Works
    a. Pollution Lookup Arguments
    b. Output Examples
  7. Program Implementation
    a. Functions and Behaviors to Implement
    i. node* node_lookup()
    ii. void print_info()
  8. Midpoint
  9. Submission and Grading
    a. Submitting
    b. Grading Breakdown [5 + 5 + 40 pts]
    i. Checking For Exact Output Match
    Choose your Teammate
    This HW may be on the longer side compared to previous HWs and you will be optionally
    allowed to choose one teammate with whom you will work on this HW. There will be no extra
    credit if you work alone, but at the same time we will not be responsible if your teammate
    doesn’t do as much work as you’d want them to do. Choose your teammate wisely and divide
    your work carefully. Changing teammates will not be allowed once you have submitted the
    form. Both members of the team receive the same grade. No discussion is allowed with
    members not in your team with regard to AI policy.
    If you decide to work with a teammate, ONE of you has to submit this Form before Sunday 2/27
    11.59 PM PST.
    Learning Goals
    ● Programming in ARM assembly
    ● Passing parameters to assembly functions
    ● Iterating through linked lists and arrays in assembly
    ● Working with a teammate
    Assignment Overview
    The animals in the jungle were able to get their pollution statistics with your help in A5.
    However, humans learnt C and were able to intercept their communications. Now, the animals
    have no choice but to look up the pollution statistics using ARM. They need your help to do it.
    For this assignment, you will be given the pollution data of a city provided in the form of a CSV
    (comma separated value) file. The database will contain the following fields. All the fields are
    integers this time (notice the float iws field is no longer there)
    year month day hour pm2.5 TEMP
    Your job is to implement the functions print_info() and node_lookup() from A5 in ARM.
    For a reminder of how hashed chaining works, watch this 5 minute video from Prof. Leo Porter
    about hash tables. Video on Google Drive
    Getting Started
    Developing Your Program
    For this class, you MUST compile and run your programs on the pi-cluster. To access the
    server, you will need your cs30wi22xx student account, and know how to SSH into the cluster.
    Need help or instructions? See Edstem FAQ. (Do NOT wait until the end to try this. There will
    be limited or no ETS support on the weekends!)
    We’ve provided you with the starter code at the following link:
    https://github.com/cse30-wi22…
  10. Download the files in the repository.
    a. You can either use
    git clone https://github.com/cse30-wi22…
    directly from the pi-cluster, or download the repo locally and scp the folder to
    pi-cluster if that doesn’t work.
  11. Fill out the fields in the README before turning in.
    Running Your Program
    We’ve provided you with a Makefile so compiling your program should be easy!
    Additionally, we’ve placed the reference solution binary at:
    /home/linux/ieng6/cs30wi22/public/bin/poll_rv-a7-ref
    You can directly use poll_rv-a7-ref as a command.
    Makefile: The Makefile provided will create a poll_lookup executable from the source files
    provided. Compile it by typing make into the terminal. By default, it will run with warnings turned
    on. To compile without warnings, run make WARN= instead. Run make clean to remove files
    generated by make.
    How the Program Works
    This section is the same as A5 (except that the iws column is removed – so there are 6
    columns now). The program shall take a date to query, and the filename of the CSV as
    arguments. It also takes some optional arguments: a flag to indicate printing of hash table
    metadata, and a hash table size.
    Inputs:
    ● Date
    ● Flag to print stats corresponding to a date
    ● Filename of the CSV
    ● Optional: Flag to print metadata
    ● Optional: Hash table size
    ● Optional: Flag which indicates if you need to remove all entries corresponding to a date
    Outputs:
    ● The minimum, maximum, and average parameters of the query date, OR an error
    message if the date couldn’t be found if the flag indicates you to print stats (The error
    message is already handled for you).
    ● Optional: Hash table metadata
    Pollution Lookup Arguments
    Format for calling this executable with arguments (the flags cannot be grouped with each other,
    e.g. -itc):
    ./poll_lookup [-i] [-t tablesize] <-d date> [-r date] <filename>
    Argument(s) Description
    -i
    (User-optional flag)
    Prints descriptive metadata about the hash table and its linked lists chains
    after the query (as shown in the output examples).
    -t tablesize
    (User-optional flag)
    tablesize (hash table size) is an integer specifying the size to be used to
    create the hash table. If -t is not specified, the default is the constant
    TABLE_SIZE.
    If the argument to -t is smaller than the constant MIN_TABLE_SIZE, an
    error message is printed with the usage message and the program quits.
    -d date date is a string specifying the date in yyyy-mm-dd format. You need to
    calculate the stats corresponding to date
    -r date date is a string specifying the date in yyyy-mm-dd format. You need to
    remove the hash table entries for date
    filename This is the path to the input CSV. It must have six columns, in this order:
    Year, month, day, hour, pm2.5, TEMP
    We will not give malformed input CSVs to your program. The input path will
    always be valid and be a properly-formatted CSV.
    You do not have to write the option parsing: It is done for you in the provided function
    parse_opts(). Do not edit parse_opts()!
    If the mandatory arguments are not provided, their error messages are printed, followed by the
    usage message. You can assume that a valid CSV will be included in any test command.
    Output Examples
    $ ./poll_lookup -i -d 2013-01-01 pollution.csv
    Minimum pm2.5: 7 Maximum pm2.5: 35 Average pm2.5: 16
    Minimum temp: -12 Maximum temp: -3 Average temp: -7
    Total size: 1873
    Total entries: 43824
    Longest chain: 168
    Shortest chain: 24
    Empty buckets: 881
    $ ./poll_lookup -i -d 2013-01-01 -r 2013-02-02 pollution.csv
    Minimum pm2.5: 7 Maximum pm2.5: 35 Average pm2.5: 16
    Minimum temp: -12 Maximum temp: -3 Average temp: -7
    Total size: 1873
    Total entries: 43800
    Longest chain: 168
    Shortest chain: 24
    Empty buckets: 881
    $ ./poll_lookup -i -t 13 -d 2013-01-01 pollution.csv
    Minimum pm2.5: 7 Maximum pm2.5: 35 Average pm2.5: 16
    Minimum temp: -12 Maximum temp: -3 Average temp: -7
    Total size: 13
    Total entries: 43824
    Longest chain: 3552
    Shortest chain: 3240
    Empty buckets: 0
    $ ./poll_lookup -i -t 13 -d 2013-01-01 -r 2013-02-02 pollution.csv
    Minimum pm2.5: 7 Maximum pm2.5: 35 Average pm2.5: 16
    Minimum temp: -12 Maximum temp: -3 Average temp: -7
    Total size: 13
    Total entries: 43800
    Longest chain: 3552
    Shortest chain: 3240
    Empty buckets: 0
    $ ./poll_lookup -i -t 1331 -d 2015-01-01 pollution.csv
    Unable to find any data for the date 2015-1-1.
    Total size: 1331
    Total entries: 43824
    Longest chain: 192
    Shortest chain: 24
    Empty buckets: 510
    Program Implementation
    Functions and Behaviors to Implement
    node* node_lookup()
    This has the same signature as the node_lookup() function in A5.
    Arguments:
    node* front → The current head of the linked list chain
    int year → year
    int month → month
    int day → day
    int hour → hour
    Operation:
    ● Searches the linked list chain for a node that matches the year year, the month
    month, the day day and hour hour.
    ● Returns the pointer to the node with matching data, otherwise, return NULL.
    Simplifying node_lookup():
    ● The function node_lookup() takes 5 parameters. The first four parameters will be in
    R0 to R3. The fifth parameter is stored by the caller just above your stackframe (fp). So
    you can access the fifth parameter as [fp, NCOLS_OFFSET]. What will be the value
    of NCOLS_OFFSET?
    Stack Frame with some addresses for illustration purposes:
    0x7fff_fff4
    +4 5th parameter
    node_lookup
    stack frame
    FP->
    0x7fff_fff0
  12. caller’s FP
    0x7fff_ffec
    -4 caller’s LR
    0x7fff_ffe8
    -8 callee saved registers
    -12 callee saved registers
    … local variables + parameter saved
    local variables + parameter saved
    SP->
    0x7fff_ffxx
    local variables + parameter saved
    0x7fff_ffxx
    ● If you run out of registers for variables, you can declare local variables and store and
    load their values from the stack as needed. Define a fixed offset from the frame pointer
    with .equ to do this.
    ● Here is a sample struct to make you understand how structures are stored in memory.
    The values in red indicate sample memory address.
    struct pride_lands
    {
    char c;
    int x;
    };
    struct pride_lands p[10];
  13. 2001 2002 2003 2004 2005 2006 2007 2008
    p[0].c unused unused unused p[0].x p[1].c
    Each element of the struct is allocated 4 bytes of memory. However, as char takes 1
    byte, 3 bytes are unused. However, int uses all 4 bytes. Notice where the next element
    in the array starts. How is it related to the size of the struct?
    Can you use the above example to answer the following questions?
    a. Given node* front, how would you load front->year, front->month,
    front->day and front->hour from memory?
    b. How are they stored in memory relative to front?
    c. How would you do front = front->next? What is the size of the structure?
    ● What will be the loop condition? What is the value of NULL?
    void print_info()
    This has the same signature as the print_info() function in A5.
    Arguments:
    node** table → pointer to hash table
    unsigned long size → hash table size
    Operation:
    ● Walk the hash table chain by chain.
    ● Output the following:
    ○ The size of the table (Table size)
    ○ The total number of nodes in the table (Total entries)
    ○ The length of longest and shortest non-empty chains (Longest chain, Shortest
    chain)
    ○ The number of empty buckets in the table (Empty buckets)
    ● Prints all this information to stdout using the following format strings, in this order:
    “Table size: %lu\n”
    “Total entries: %lu\n”
    “Longest chain: %lu\n”
    “Shortest chain: %lu\n”
    “Empty buckets: %lu\n”
    Simplifying print_info():
    ● You have to iterate through the table for each entry, iterate through the linked list for that
    entry. Given table and i, how would you find table[i] in ARM?
    ● How will you initialize the variables for minimum chain length and maximum chain
    length?
    ● How would you call printf() in ARM to print to standard output?
    ● Use the same logic as you used in node_lookup() for iterating through a linked list.
    Allowed ARM Instructions
    You are only allowed to use the instructions provided in the ARM ISA Green Card. Failure to
    comply will result in a score of 0 on the assignment.
    Style Requirements
    Points WILL be given for style, and teaching staff won’t be able to provide assistance or
    regrades unless code is readable. Please follow these Style Guidelines for ARM assembly.
    Midpoint
    This part of the assignment is due earlier than the full assignment, on
    Wednesday 3/2 at 11:59 pm PST. There are no late submissions on
    the Midpoint.
    Complete the Gradescope assignment“A7: Midpoint”, an Online Assignment that is done
    entirely through Gradescope. This assignment consists of a few quiz questions and a
    free-response question where you will document the pseudocode or C code of both functions.
    This is a planning document and does not need to reflect your final implementation, although
    you are encouraged to keep it up to date. Answers in plain English will receive 0 points.
    Discuss your implementations of the following functions: node_lookup and print_info.
    Submission and Grading
    Submitting
  14. Submit your files to Gradescope under the assignment titled“A7: Pollution Lookup
    Revisited”. As this is a group submission, you need to make only one submission per
    group from any one of your gradescope accounts. After submitting, add your
    teammate’s email id to your submission.
    You will submit the following files:
    node_lookup.s
    print_info.s
    README.md
    You can upload multiple files to Gradescope by holding CTRL (⌘ on a Mac) while you
    are clicking the files. You can also hold SHIFT to select all files between a start point
    and an endpoint. Alternatively, you can place all files in a folder and upload the folder to
    the assignment. Gradescope will upload all files in the folder. You can also zip all of the
    files and upload the .zip to the assignment. Ensure that the files you submit are not in
    a nested folder.
  15. After submitting, the autograder will run a few tests:
    a. Check that all required files were submitted.
    b. Check that your files compile without warnings.
    c. Runs some tests on the resulting poll_lookup executable.
    Grading Breakdown [5 + 5 + 40 pts]
    Make sure to check the autograder output after submitting! We will be running
    additional tests after the deadline passes to determine your final grade. Also, throughout this
    course, make sure to write your own test cases. It is bad practice to rely on the minimal
    public autograder tests.
    To encourage you to write your own tests, we are not providing any public tests that have
    not already been detailed in this writeup.
    The assignment will be graded out of 50 points and will be allocated as follows:
    ● Midpoint writeup: 5 points. This part of the assignment is due earlier than the full
    assignment, on Wednesday 3/2 at 11:59 pm PST. Complete the Gradescope
    assignment“A7: Midpoint”.
    ● Style: 5 points
    ● Public tests with the provided examples.
    ● Private tests with hidden test cases.
    NOTE: The end to end tests expect an EXACT output match with the reference binary.
    There will be NO partial credit for any differences in the output. Test your code – DO NOT
    RELY ON THE AUTOGRADER FOR PROGRAM VALIDATION (READ THIS SENTENCE
    AGAIN).
    Make sure your assignment compiles correctly through the provided Makefile on the
    pi-cluster without warnings. Any assignment that does not compile will receive 0 credit.
    Checking For Exact Output Match
    A common technique is to redirect the outputs to files and compare against the reference
    solution
    1
    :
    ./your-program args > output
    our-reference args > ref
    diff output ref
    If the second command outputs nothing, there is no difference. Otherwise, it will output lines that
    differ with a < or > in front to tell which file the line came from.
    END OF INSTRUCTIONS, PLEASE RE-READ!
  16. You might want to check out vimdiff on the pi-cluster (https://www.tutorialspoint.co…).
正文完
 0