CSCI 5105 (Spring 2021)
Introduction to Distributed Systems
(Instructor: Anand Tripathi)
Assignment 5 (100 points)
Due March 21, 2021
This assignment can be done in a group of up to two students.
See the last page for clarifications on HALT request processing. (March 9)
This is a programming assignment building upon the work you did on Assignment 3. You will be
implementing the server as a group of replicated processes.
Submission Instructions: Put all your files with program code and other documentation in a
single file zipped (.zip or .tar or .tar.gz or .tgz) file and submit it using the Homework Submission
link on the Canvas page for the course.
You are allowed to change or modify your submission until the due date, so submit early and
often. Verify that all your files are in the submission. Failure to submit necessary files will result in
a score of zero for all missing parts.
General guidelines for all submissions are as follows:
• Include your name(s) and x500 at the top of each submitted file.
o For a group assignment, both partners should submit the work. All files MUST
include the names and x500 IDs of both partners. The submission with the latest
timestamp will be graded. Name your zip or tar or tgz file to contain the
lastnames of both partners: For example, when Smith and Jones are submitting it
as a team, name it as:
Jones-Smith-Assignment-5.zip.
• Your solutions to non-programming portions of the assignment must be in a PDF format.
• If there is a programming component, please make sure your submission contains all the
necessary files to compile and run the programming portion of the assignment. Your
program must compile and run correctly on a CSE-Lab Linux machine as grading will be
performed on these machines.
o Include a README to list any known issues and or assumptions about your
programs. Also list the machine(s) you used to test your code.
Grace Day usage
If you are using your grace days, indicate how many you are using at the top of each submitted
file in your submission, e.g. .pdf or .c files. The following is a sufficient enough indication of
grace days being used:
Using <#> grace days
Late submissions which omit the indication will be graded with the following penalties:
1-day late 10% penalty
2-days late 20% penalty
3-days late 30% penalty
Submission after 3 days, no credit unless you are using any grace days.
Showing your work
Please show all of your work for questions that ask you to derive a formula or compute a certain
quantity. This can include proofs, diagrams, or step-by-step derivations/calculations. Answers,
correct or not, without sufficient proof of work may be penalized. On the other hand, answers that
are incorrect may earn partial credit if the work was correct but had minor mistakes. All programs
in the programming part must be commented with proper instructions for compiling and executing
the code. If your code does not work for any of the testcase, please indicate so.
Include a README file, and a makefile when necessary.
2
CSCI 5105 (Spring 2021)
Introduction to Distributed Systems
(Instructor: Anand Tripathi)
Assignment 5 (100 points)
Due March 21, 2017
This assignment can be done in a group of up to two students.
Goals: The goal of this assignment is to learn how to implement a replicated service using the
State Machine Model. Implemented using Lamport clocks.
Problem Statement: In this assignment you will build upon the server program which you
implemented in Assignment 3. You will create a group of replicated server processes, and each
process in the group will execute the same server code. All processes will maintain exactly the
same data about the bank accounts and they will execute exactly the same sequence of
operations.
• All these processes will be required to execute the same sequence of operations, using
the State Machine Model. (Discussed in my Lecture Notes 8 and in the paper on logical
clocks by Lamport.)
• Each process in the server group will be assigned a unique integer ID, which will be used
for total ordering of events using timestamps based on Lamport clock.
You must use the Java RMI-based server from Assignment 3 for this assignment.
Options in designing your program:
• You have the option to implement the communication between server process group
members using Java RMI or Thrift RPC, if you want. It is your choice.
A typical execution environment will have multiple clients, and a client will connect to any one of
the server processes in the replicated server group to request an operation. The set of allowed
operations will be the same as defined for Assignment 3.
For implementing the State Machine Model, you will need to implement Lamport clock (logical
clock) at each server process. All message communication and significant events in a process
will need to advance the Lamport clock value appropriately. The client processes do not need to
maintain the Lamport clock.
State Machine Model Protocol Description:
Terminology used in the following outline of the State Machine protocol:
• Server Process Group: refers to the group of processes executing the server program
code.
• Server Process Replica: refers to a member process in the Server Process Group.
The clients and Server Process Group members will execute as follows:
- A client process will send its request to one of the members of the Server Process Group.
- A Server Process Replica receiving a request from a client will multicast the request
message to all other members in the Server Process Group. Before multicasting the
request message, it will be marked with a timestamp based on the Lamport clock of the
server process replica. The multicast will be implemented by process-to-process RMI
call or Thrift RPC.
3 - On receiving such a multicast request message, a Server Process Replica will
appropriately advance its Lamport clock and send an acknowledgement with its current
(i.e. advanced) Lamport clock value to the sender Server Process Replica. - Each Server Process Replica will maintain a queue of all requests received either directly
from clients or indirectly from other members of the Server Process Group through
multicast request messages. The requests in the queue will be ordered in the increasing
values of their timestamps. The request at the head of the queue will be the one with the
smallest timestamp value. - When a server process replica finds that at the head of the queue is a request which it
had originally received from a client and timestamped it for multicasting, and if it is
certain that there will be no future request in the system with any smaller timestamp
value, then:
• It will remove that request from the queue and execute it locally, and it will also send
an execute command to all other Server Process Group members to execute it.
(Remember that each request is uniquely identified by its Lamport timestamp.)
• On receiving an execute command from another member of the Server Process
Group, a server process replica will remove from its local queue the specified request
and execute it locally, but without sending any response message to the client or any
Server Process Replica.
• The Server Process Replica which originally received the request from a client will
send a response message to that client after executing the request according to the
State Machine model described above.
Configuration and Testing of Replicated Servers:
If you are using the RMI-based server, then you may consider having two different interfaces –
one for communication between the clients and the servers, and the other for communication
between peer server replicas.
You have the option to use either Java RMI or Thrift for peer communication between server
replicas.
In your design, if you are using Thrift RPC for peer-to-peer communication between server
replicas, then your server will use a specified port number for Thrift RPC. Please follow a suitable
format for the configuration file and clearly describe it in your README file.
You should be able to run the server processes either on one or multiple computers. You will
create a configuration file to indicate the server process ID and the host on which it will be
running, and the rmiregistry port number.
In the example below a configfile is shown for a system with three server processes, which are to
be executed on three different hosts with names alpha, beta, and delta. For each host, the serverID
and the rmiregistry port number are specified. In this configuration, server replica with id 0 will
be running on alpha.cselabs.umn.edu, and the rmiregistry on that host will be running on port
5000.
Note: The fourth column will be needed only if you are using Thrift RPC for server-to-server
communication. Otherwise, you do not need the fourth column at all.
Example of a configuration file data:
Hostname SERVER-ID RMI registry Thrift RPC Port
alpha.cselabs.umn.edu 0 5000 4000
beta.cselabs.umn.eedu 1 5001 4001
delta.cselabs.umn.edu 2 5002 4002
4
Note: If you are running all these three processes on the same host
• Then there will be only one rmiregistry on that host and the same hostname and
rmiregistry port number will be specified for all three server processes. However, the
server processes must register themselves with the local rmiregistry with distinctly
different names such as Server_0, Server_1, Server_2 etc.
• You will need to make sure that you choose different Thrift port numbers for them if you
are executing the server processes on the same host.
You will start a server process on each one of the hosts as specified in your configfile and you will
give as command line arguments the server ID and the configuration filename.
java server server-ID configFile // start server with the specified ID
Note: Make sure that the host where are you are running this and the server-ID are consistent
with the contents of the configFile.
Server Data Initialization:
When a server process is started, it will first create 20 accounts. These will be sequentially given
integer account ID values, starting with 1. It will also initialize the balance of each account to
- After that the server will print to the console (screen) a message indicating that the
initialization is complete and it is ready to get requests.
Client Process Structure:
When all servers are initialized and ready, you will start your client processes. You will write a
multithreaded client, and the number of threads will be specified as an argument to the client
process. You will also pass as argument the config filename to the client process.
For example:
Java client Client-ID 24 configFile // will create the client process with 24 threads
Each client process will be assigned a unique ID, which will be communicated to the server along
with the operation request. A client thread will randomly pick one of the servers and send a
request (as noted below) to that server and wait for the response before sending the next
request. It will repeat this step for 200 times, and in each iteration it will randomly pick one of the
Sever Process Replicas.
Each client thread will perform the following two operations 200 times and terminate.
- It will randomly pick two accounts and transfer 10 from one account to the other.
- It will write to the client logfile a record indicating the operation request and server process
ID. It will also write a log record when a response is received.
After all client threads have terminated, the main thread will send a“HALT”command to the
server process with ID equal to 0. Note the HALT should be communicated and executed by each
server process replica using the State Machine Model. Each server process will execute this
command as follows: - The server process will write to the logfile the current balance in each of the 20 accounts and
the sum of the balance in all 20 accounts, which should be 20000 if your system works
correctly. The server processes will also write to the logfile the pending requests in the
request queue. It will also write the results of the performance measurement experiment
noted below. There should be none in the queue. After that the server process will terminate. - The server processes will close all the log files and terminate.
5 - The server with ID equal to 0 will send a response to the client indicating that it has executed
HALT.
The client process, after getting response for HALT, will write to its client logfile, close the log file
and terminate.
Logging requirements in your implementation:
Server and client processes will be writing all important events to their respective log files. Each
event will be recorded as new line appended to the log file. These will be helpful in tracing the
event logs for debugging and verification.
Each client process will be assigned a unique ID. You may choose any suitable scheme for these
unique ID (just an integer value or some string name). You may have to include the Client ID in
the requests which are sent by a client to the server group. A client process will write to its log file
the requests that it sends to the server, along with the ID of the server. All client threads in a
client process will write to the same single log file.
Each server process will also be maintaining a log file and it will write to its log file information
about the requests as follows. Each such record is written as one line in the log file, and it must
be in the order in which the events occur in the process.
Naming of Log files: You should make sure that the log file created by a server is appropriately
named so it is clearly associated with that server’s ID. Similarly, appropriately include the client ID
in a client’s log file name.
Server Logging of Events:
When a server receives a request from a client, the server process will log it as follows. The
Server-ID in the example log line below is the unique integer ID of the server.
Server-ID“CLIENT-REQ”Physical-clock-time Request-Timestamp Operation-name Parameters
For example:
Server-2 CLIENT-REQ 2021-03-04T11:50:41.011694 [678, 2] GetBalance
(To get physical clock time, you can use now() method of Java LocalDateTime class from java.time
package.)
The string“CLNT-REQ”is to indicate that the request came directly from a client. The request
timestamp of a request will be a pair of two integers: [Lamport clock, ID]. It will be assigned to the
request by the Server Process Replica that originally received the request. This will be used to
determine the order of the requests.
When a server replica receives a request multicast by another Server Process Replica, it will
write a log line:
Server-ID“SRV-REQ”Physical-clock-time Request-Timestamp Operation-name Parameters
The string“SRV-REQ”is to indicate that the request came indirectly from another peer server
process replica through a multicast.
When a server removes a request from the request queue for processing, it will write a log line:
Server-ID“REQ_PROCESSING”Physical-clock-time Request-Timestamp
6
Client Logging of Events:
The client process (thread) will write to the logfile an event record when it sends a request to a
server process. The structure of the log line will be as follows:
CLNT-ID SRV-ID“REQ”Physical-clock-time Operation-name Parameters
It will also write a log record to the file when a response is received.
CLNT-ID SRV-ID“RSP”Physical-clock-time Response status
Performance Measurement Experiment:
The goal of this experiment is to measure the overhead due to replication management using the
State Machine Model. At each server, measure the average value of the service processing time,
i.e. the time between receiving a request from a client to the time when a response is sent to the
client for that request.
For measuring elapsed time between two events, you should use System class’s getNano() method:
long t0 = System.nanoTime(); //event 1
…..
long t1 = System.nanoTime(); //event 2
Measure this value for three different configurations with the number of replicated server
processes set to 1, 3, and 5. For this experiment, use just one client process with 24 threads. All
server processes and the client process must be running on different hosts. Also compute the
average value of the response time observed by the clients.
Submission Requirements:
- Server code
- Client code
- Any other related code files
- Sample config file
- Results of your performance measurement data that you observed for the three system
sizes with the number of server processes set to 1, 3, and 5. In reporting the data for this
experiment, indicate the CSELab machines on which you tested your program. - README file with
a. Instructions on how to run and test your program.
b. Indicate what will be the names of the log files generated by running your
system.
c. Any known bugs in your code - A script file to compile your programs. You can also consider writing a script file to start
server processes on a specified list of nodes. Assume that all nodes have the same NFS
file system structure. Please include any script file you used to verify the correctness of
your program.
Submission Instructions:
• Each member in the group must submit the project work.
• Please include names and student-IDs of all group members.
• Submit one UNIX tar file containing parts listed under the requirements.
7
Grading Criteria: Grading will be based on the following allocation points:
a) Test Case 1: (Points 25): Correct operations of the State Machine Model at all
processes with a configuration of 5 server processes and one client process with 24
threads, when all are executed on a single host.
a. Correct order of request processing at each server process as verified using
server logs. (15%).
b. Identical final state of accounts database at all server processes. (10%)
b) Test Case 2: (Points 35): Correct operations of the State Machine Model at all
processes with a configuration of 5 server processes when server processes are
executed on a five different hosts and 3 client processes, each with 24 threads, are
executed on some other host.
a. Correct order of request processing at each server process as verified using
server logs. (25%)
b. Identical final state of accounts database at all server processes. (10%)
c) Performance experiment data. (10 %)
d) Correctly writing the log files at the server (10%)
e) Correctly writing the log files at the client (10%)
f) Documentation Items: (i) README file, (ii) code documentation, (iii) example config file,
(iv) detailed instructions on how to test your program, (iv) Indicate on which CSELab
machines you tested your program. (10%)
Clarifications on HALT request processing:
When we have multiple client processes, should Server-0 wait to receive HALT request
from all client processes before stopping the system or should it stop the system after
getting the first HALT command?
In out testcases it is better that the system is halted only after getting the HALT command from all
client processes.
This means Server-0 should know how many client processes are running in our testcase
execution. For that purpose, just add one additional command line argument indicating the
number of client processes in the test run.
java server server-ID configFile numClients
Server-0 will initiate system shutdown only when it has received HALT requests from all of the
clients. When that happens, there cannot be any pending request in of the server’s queue.
Therefore, Server-0 on receiving HALT requests from all clients can simply send a command to
all other server processes to execute HALT. There would not be any need to process the HALT
request using the State Machine protocol.
For example, when we are running the system with 3 clients, when the Server-0 receives the first
and the second HALT requests, it can either keep them pending or just send a reply with OK
response and let that client process terminate. Only upon receiving the third HALT request, it
would initiate server shutdown.