乐趣区

关于算法:COMPSCI-340-合并算法技巧

COMPSCI 340 Assignment 1 10% of your grade
Due date: 11:59 pm Friday 12th August

Introduction

We no longer live in a world where computing speeds increase along with Moore’s Law.
Instead we maintain increased throughput in our computers by running more cores and
increasing the amounts of parallelism. Parallelism on a multi-core (multi-processor)machine
is looked after by the operating system.
In this assignment you have to parallelise a simple algorithm in a number of different ways and
answer questions summarising your findings.
This assignment is to be done on a Unix based operating system.We recommend
the flexit.auckland.ac.nz Ubuntu image (called Ubuntu2004). The distributed code runs on
Linux, but you can modify it to run on any Unix based operating system such as MacOS. It
should run on the Windows subsystem for Linux. To get the results for this assignment you
will need a machine (real or VM) with 4 or more cores. When you come to time your
programs run them all on the same machine.
Markers will use the Ubuntu image provided on flexit.auckland.ac.nz. So please ensure that
your submitted programs run in that environment. The FlexIT Ubuntu image has at least 4
cores.

The merge sort

The algorithm you have to parallelise isthe merge sort. The reason the merge sort was chosen
is that it is almost the perfect algorithm to parallelise. Just to remind you, the merge sort
breaks the data to be sorted into two halves, sorts the halves recursively and merges the two
sorted halves back together to give the answer.
The version of the merge sort you have to use is written in C and available on the assignment
Canvas page as a1.0.c.
In all of the work which follows do NOT optimise theC compilations. Always compile
using:
cc filename.c -o filename (you will need to use -pthread for some programs).
Do all of the timings on the same (or very similar) machine and don’t watch videos or do
similar things while you are taking the timings.Take the timings at least 3 times and average
them.

Thingsto do

Step 0
Read through and understand the code in the file a1.0.c.
Compile and run the program. An important part of the program is the is_sorted
function before the end. You will always need to call this to ensure that any changes you
make to the implementation do in fact return the sorted data.
If you run the program without any command line parameters e.g.
./a1.0
it will use a default array of only two values.
Run it with a larger amount of random data by including the size parameter on the command
line e.g.
./a1.0 1000
runs the program with an array of 1000 random values.
Determine how large an array can be dealt with by this program before you get a
segmentation fault.
Segmentation faults happen when something goes wrong with memory accesses or
allocations(a segment is an area of memory). In our case the program allocatesspace for the
array of values, and extra working space, on the stack, and as the size of the array increases
eventually we run out of stack space.
Question 1 [1 mark]: What environment did you run the assignment on (e.g., operating
system, number of cores, amount of real memory)? Hint: man uname, man free and
man lscpu. The output of uname -a provides some of this information, free provides
information on the amount of memory, and lscpu provides information on the number of
CPUs. Also mention whether you were using a virtual machine and if so say which one. [0.5
marks]
What approximate size for the array can you get to in the original program before a
segmentation error occurs? [0.5 marks]
You don’t need to submit a1.0.c
Step 1
Modify the program (call it a1.1.c) to increase the amount of stack space so that the
program can at least deal with 100,000,000 numbers. Hint: man getrlimit and
setrlimit.
Time how long it takes the program to sort 100,000,000 numbers and record the result.
time ./a1.1 100000000
Submit this program as a1.1.c
Page 3 of 5
Question 2 [1 mark]: Run“ulimit -a”. Explain why the limit on the stack size is
considerably smaller than the amount of memory available to other sections of memory which
are marked as“unlimited”?
Step 2
Modify the program (call it a1.2.c) to use two threadsto perform the sort.
You will need to make sure you are running on a machine with at least 2 cores. If you are
using a virtual machine you may need to change the configuration to use at least 2 cores. The
FlexIT Ubuntu image has at least 4 cores.
Hint: man pthread_create, pthread_join
Each pthread has its own stack and the standard pthread stack size is very limited (why do
you think this is so?). So you will need to increase the stack size. You need to change the
pthread attributes to do this.
Hint: man pthread_attr_init, pthread_attr_setstacksize
Time how long it takes the program to sort 100,000,000 numbers and record the result.
Submit this program as a1.2.c
Question 3 [1 mark]: Explain how you parallelized the sort and compare the times you
observe with the times from Step 1. Explain why the times are different.
Step 3
Modify the program (call it a1.3.c) to use as many threads as there are cores to perform the
sort, and no more.
You will need to make sure you are running on a machine with at least 4 cores. If you are
using a virtual machine you may need to change the configuration to use at least 4 cores (if
your computer can do this).The FlexIT Ubuntu image has at least 4 cores.
First you need a way to determine from within the program how many cores you have in the
environment you are using. Hint: man sysconf or man get_nprocs_conf
You don’t have to worry about the other work going on in the computer just proceed as if all
cores are available for your sorting program.
For this step you MUST use some shared state to keep track of how many threads are
currently active.Every time a new thread is started you should add one to a counter and every
time a thread stops running you should reduce that counter. Whenever you are about to call
merge_sort you either start a new thread (if you haven’t reached the maximum number of
cores yet) or use an ordinary function call to merge_sort from within the current thread).
The problem with this approach is that multiple threads can be accessing the shared state at
the same time, so you will need to provide mutual exclusion over the thread counter. Hint:
man pthread_mutex_init, pthread_mutex_lock,
pthread_mutex_unlock.
Time how long it takesthe program to sort 100,000,000 numbers and record the result. You
Page 4 of 5
should also take a screen shot of the System Monitor program showing the
Resources tab as your program runs. This will prove that all cores are being used.
Submit this program as a1.3.c
Question 4 [2 mark]: Explain how you parallelized the sort and compare the times you
observe with the times from Step 2. Explain why the times are different.
Step 4
Go back to step 2 and modify the program (call it a1.4.c) to use two processes rather than
two threads.
Processes normally don’t share memory with each other and so there will have to be some
communication between the processes. Hint: man fork, pipe.
One of the interesting thingsis that the fork system call copies the data in the parent process
so that the child can see the data from the parent (at the time of the fork). This means the
child process does not need to copy data from the parent to the child. However, after the child
has sorted the data the resulting sorted values have to be sent back to the parent in order for it
to do the merge.
Time how long it takes the program to sort 100,000,000 numbers and record the result.
Submit this program as a1.4.c
Question 5 [3 mark]: Explain how you set up the pipes and send the sorted data back to the
parent. In your answer include how the parent process knows when a child process has
completed its sorting, and how it puts the received sorted data into the correct place in the
main array.
Question 6 [3 mark]: Compare the times and memory usage of this program with the earlier
steps. Explain the results.
Step 5 Same as step 4 but rather than passing information back to the parent process we share the
memory to be sorted in the processes. Hint: man mmap, wait.
Time how long it takes the program to sort 100,000,000 numbers and record the result. You
should also take a screen shot of the System Monitor program showing the Resources tab as
your program runs.
Submit this program as a1.5.c
Question 7 [2 mark]: Explain how the parent process sets up the shared memory for the
child processes. In particular say whether or not the parent is sharing the same data with all
the children, and why you did it that way.
Question 8 [2 mark]: Explain how the parent process knows when a child process has
completed it sorting.
Page 5 of 5
Question 9 [2 mark]: Compare the times and memory usage of this program with the
earlier steps. Explain the results.
Question 10 [3 marks]: Which of the steps do you consider gives the best solution? You
may order the techniques from slowest to fastest and include a brief explanation of what is
happening in each of them and how that relatesto their performance. You may include
relevant timing information and screen shots from the System Monitor.
Submission

  1. Enter your zipped source files in the “Assignment 1 Programs” Assignment on Canvas.
    Remember to include your login in each file.
  2. Enter your answers to the questions in the “Assignment 1 Questions” Canvas quiz. As this
    is set up as a quiz, I recommend you write your answers in a file and prepare them before
    running the quiz and pasting the answers into the corresponding question field.
    ? Questions from each step will be marked only if you have written and submitted the
    program from that step.
    ? The submitted source code files must include your name and UPI.
    By submitting a file, you are testifying that you and you alone wrote the contents of that file
退出移动版