关于后端:Comp-4510

Comp 4510 – Assignment 3 (Part A)
Due: Due anytime before October 21, 2021 (11:59pm)
Instructions:
This assignment should be completed on the Mercury machine using MPI. The MPI Programming Environment
document on the course website describes how to log into these machines and use them – please read it before
you begin.
You will be expected to follow the Assignment Guidelines, which are available in the assignments folder on the
course website. This document describes how to organize and hand in your code. Please review it before you
begin. Note that a failure to follow these standards will result in a loss of marks. Always make sure your program
works for any input and optimize your code (do not add unnecessary statements to make your program workmarks
will be deducted.). Please place the answers to the written part of the programming questions (I) and
written questions (II) in a single word document. Submit all programs that you have written or made
modifications to. Also submit all output files. Give appropriate names to the output files: for example,
question1_datasize_processsize.out.) You need not submit the original files provided by the instructor if not
requested.
This assignment is to learn MPI programming through an interesting maths application.
Total Marks: [50] This assignment is worth 10% of the total assignment marks.
I. PROGRAMMING QUESTIONS [35]

[10] Consider the example we discussed in class from LAM-MPI manual (page 12). The example
was to illustrate dynamic load balancing using an on-demand approach. We will now look at
converting this program into a static load balancing single program multiple data approach with some
modifications. [Note: Think about how you will send and receive: batches, non-batches, point-topoint,
collective, etc. Choose the MPI routines that best fits the architecture to reduce
communication/synchronization latencies.]
a. Assume there are N (NUM_WORK_REQs) exam papers, q markers. Assume N is divisible
by q.
b. Each marker is assigned N/q exam papers to mark. Assume one of the markers (Process 0
as in the manual) is also the manager who collects the results of the exam papers.
c. Assume an array (A) of size N, where the 𝑗𝑗𝑡𝑡ℎ row in A is the ID of the exam paper [Or, you
may choose another data structure such as a linked list and add arguments to the list as
needed.]
d. Each exam paper (j) stores two values: ID of the marker and mark [“struct”(record) data
type would be appropriate].
e. Each marker 𝑞𝑞𝑖𝑖 (similar to the slave() procedure that computes result) computes the
mark of paper j as follows: mark = (𝑞𝑞𝑖𝑖 + 1)* j +N/q.
f. Each marker sends the ID and mark of the exam papers it was assigned, to the manager.
g. The manager should print out for each exam paper in sequence, the marker ID, paper ID and
mark on each line. That is“Marker ID = …, Paper ID = …, Mark = …”)
[25] In this question, you will use MPI to parallelize a program that generates Julia Set Fractals (a
type of mathematically defined image). Several sample pictures of Julia Set Fractals are shown on
the following page. The sequential program that generated these images is available in the
assignment folder on the course website.
The Sequential Program
Download the sequential code (and upload it to Mercury) and try running it. Look at the output image
you get (it’s written to the file out.bmp when the program finishes). You will need to transfer the
image back to your own local machine to view it.
Take a look at the main() and compute_row() functions. The main() function contains a loop
that calls compute_row() for each row in the output image. The compute_row() function, in turn,
contains a loop that goes through each column (pixel) in the row, does some calculations to figure out
what colour it should be, then stores the result in the data array. It also updates the hist array, which
contains the counts (frequencies) of each of the distinct values in the data array.
Parallelizing the Code
Your task is to make modifications to the sequential code to parallelize it using MPI function calls. Each
of the modifications you will need to make are described below. To run your parallel program, you will
need a parallel myjob script (the one in the Julia example code only launches one process). You can
use a copy of the parallel myjob script from the hello MPI example provided on the course website.
Your parallel MPI version of the code will launch Q processes. The root process will act as a
“supervisor”(master), and the others will be the“workers”(slaves).
We will parallelize the code by partitioning the work of computing the rows (currently done by the
loop in main()) among the Q-1 worker processes. Each of the Q–1 workers should compute exactly
HEIGHT / (Q-1) rows of the output image. You may assume that
HEIGHT % (Q-1) == 0 so that each worker gets exactly the same number of rows with no
leftovers (or shortages).
In order to do this, each worker process will need its own copy of a data array and a hist array (each
with enough space to process its own chunk of rows). When a worker is finished its computation (which
it can do by simply calling compute_row() – though you may have to make some small modifications
to the indexing of the arrays inside the function), it should send these arrays back to the supervisor
process.
The supervisor process’s job is to wait to receive the data and hist arrays from each of the worker
processes. When a worker sends its results, it should store the chunk of the data array that is received
in the appropriate location in its own (full size) data array, and add the values in the received hist
array into its own hist array. Finally, once all workers have sent back their results, the supervisor
should call the functions hist_eq() and write_bmp() to output the image, as main() does at the
end of the existing sequential code.
When you are finished, generate a test image using your program. To do this, change the SCALE and C
constants in the julia_iters.h file to these values:

define SCALE 4.0

define C -0.839 + 0.4I

This will generate a computationally intensive fractal. Run your program with 41 processes (1
supervisor and 40 workers.1 Note that this may take some time (usually between 5 and 10 minutes).
Include the resulting image (out.bmp) with your code when you submit your assignment.
Notes:
• The only functions you’ll need to modify here are main() and compute_row() – though
you are free (and encouraged) to add your own new functions if you want.
• While you do not need to understand the math that goes into computing the fractal to complete
this question (i.e. the stuff that the other functions in the code do), if you are curious, a brief
description is provided below.
• The provided code uses a simple BMP library (“Quick and Dirty BMP”) to write the image
files. This is what the“libqdbmp.a”file (and corresponding header file) is for. The existing
makefile links it with your program.
• You can use PSCP for secure file transfer between the mercury server (MPI programming
environment) and your local machine. Details on how to use it is given in the link
https://it.cornell.edu/manage…
If you’re interested, here are the settings for generating each of the images below (feel free to try out
your own as well!):
SCALE = 3.0 SCALE = 3.0 SCALE = 3.0 SCALE = 3.0
C = 1.0 + 0.5I C = 1.0 + 0.05I C = 0.55 + 0.85I C = 0.05 + 1.0
Although it is not necessary to understand the Julia set algorithm to do this question, if you are
interested, a brief description is given below:
The math
The Julia set is a mathematically defined set of complex numbers. Suppose we have a complex number
𝑧𝑧0. To test whether or not 𝑧𝑧0is in the Julia set, we can use the following process:
Generate a new complex number 𝑧𝑧1 by setting i = 1 in the following equation:
𝑧𝑧𝑖𝑖 = 𝐶𝐶𝐶𝐶𝐶𝐶𝐶𝐶(𝑧𝑧𝑖𝑖−1)
Repeat the above process (setting i = 2, 3, … n), to generate a sequence of complex numbers
𝑧𝑧0, 𝑧𝑧1, 𝑧𝑧2, . . . 𝑧𝑧𝑛𝑛. If the imaginary part of the values in the sequence tend toward infinity, then 𝑧𝑧0is not in
the set. If they do not, then 𝑧𝑧0is in the set.
Generating the image
Note: do NOT hardcode the number of processes in your program – use the MPI_Comm_size()
function as described in class. You can find an example of this in the hello_world example code
available on the course website.
To generate an image using the Julia set, we go through every pixel (x, y) and combine them in some
way (how doesn’t really matter, so long as it’s consistent) to create our initial complex number 𝑧𝑧0. We
then apply steps 1 and 2 of the above process in a loop, stopping when either:
a. the imaginary part of the number exceeds 50 (indicating that it is tending toward infinity), or
b. we exceed 216 iterations (indicating that it is not).
When the loop stops, we record the number of iterations it took in an array called data.
After we have completed the steps from above paragraph for every pixel in the output image, the data
array is full of the iteration counts for each pixel in the output image. We can now generate the image
by interpreting each of these counts as a 24-bit RGB colour value (with the lower 8 bits representing
blue, the next 8 bits green, and the upper 8 representing red), and writing those colour values into an
BMP file.
However, there is a problem with this process. If all of the iteration counts are very similar (say
between 100 and 110), they will all generate very similar colours, making it very hard to see the
patterns in the image. To prevent this, before we interpret the data array as RGB values, we use a
technique called histogram equalization to normalize the distribution of the iteration counts it contains.
This involves the use of another array called hist (used to store the frequencies of each of the values
in data), which you’ll see in the code. After we have tweaked the data values to fix the colours, we
convert them to RGB colour values and write them to the image file.
II. WRITTEN QUESTIONS [15]:
[11] Suppose MPI_COMM_WORLD consists of the three processes 0,1, and 2, and suppose the
following code is executed:
int x, y, z;
switch(my_rank) {
case 0: x=0; y=1; z=2;
MPI_Bcast(&x, 1, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Send(&y, 1, MPI_INT, 2, 43, MPI_COMM_WORLD);
MPI_Bcast(&z, 1, MPI_INT, 1, MPI_COMM_WORLD);
break;
case 1: x=3; y=8; z=5;
MPI_Bcast(&x, 1, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Bcast(&y, 1, MPI_INT, 1, MPI_COMM_WORLD);
break;
case 2: x=6; y=7; z=8;
MPI_Bcast(&z, 1, MPI_INT, 0, MPI_COMM_WORLD);
MPI_Recv(&x, 1, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD,
&status);
MPI_Bcast(&y, 1, MPI_INT, 1, MPI_COMM_WORLD);
break;
}
Indicate in the table below, the output of the values (x, y, z) after execution of each of the
instructions by each of the processes. Show all your work.
Process 0 Process 1 Process 2
Bcast, x = ? Bcast, x = ? Bcast, z = ?
Send, y = ? Bcast, y = ? Recv, x = ?
Bcast, z = ? z = ? Bcast, y = ?
(x,y,z) = ? (x,y,z) = ? (x,y,z) = ?
[2] Consider the following code for two processes:
if (myrank == 0) then {
compute (); %this is a function
MPI_Barrier();
else
donothing();%this is a function;
Is this a safe program? Explain your answer.
[2] Consider the following code for two processes:
if (rank == 0) {
X = 5;
MPI_ISend(&X, 1, MPI_INT, 1, tag, MPI_COMM_WORLD, &request1);
MPI_Test(&request1, &flag, &stat1);
X= X +2;
Dosomething(); //This function is NOT dependent on X
}
else
MPI_RECV(&X, 1, MPI_INT, 0, tag, MPI_COMM_WORLD,&status);
The intended behaviour is that Process 1 should receive X = 5. Explain why this is not guaranteed.
Rewrite the code so this is guaranteed.

define SCALE 4.0

define C -0.839 + 0.4I