共计 1932 个字符,预计需要花费 5 分钟才能阅读完成。
Program Assignment #2
Due day: NOV. 16, 2021
Problem 1: Matrix-Matrix Multiplication
In the first hands-on lab section, this lab introduces a famous and widely-used example
application in the parallel programming field, namely the matrix-matrix multiplication.
You will complete key portions of the program in the CUDA language to compute this
widely-applicable kernel.
In this lab you will learn:
‧ How to allocate and free memory on GPU.
‧ How to copy data from CPU to GPU.
‧ How to copy data from GPU to CPU.
‧ How to measure the execution times for memory access and computation
respectively.
‧ How to invoke GPU kernels.
Your output should look like this:
Input matrix file name:
Setup host side environment and launch kernel:
Allocate host memory for matrices M and N.
M:
N:
Allocate memory for the result on host side.
Initialize the input matrices.
Allocate device memory.
Copy host memory data to device.
Allocate device memory for results.
Setup kernel execution parameters.
of threads in a block:
of blocks in a grid :
Executing the kernel…
Copy result from device to host.
GPU memory access time:
GPU computation time :
GPU processing time :
Check results with those computed by CPU.
Computing reference solution.
CPU Processing time :
CPU checksum:
GPU checksum:
Record your runtime with respect to different input matrix sizes as follows:
Matrix Size GPU Memory
Access Time
(ms)
GPU
Computation
Time (ms)
GPU
Processing
Time (ms)
Ratio of
Computation Time
as compared with
matrix 128×128
8 x 8
128 x 128 1
512 x 512
3072 x 3072
4096 x 4096
What do you see from these numbers?
Problem 2: Matrix-Matrix Multiplication with Tiling and Shared Memory
This lab is an enhanced matrix-matrix multiplication, which uses the features of
shared memory and synchronization between threads in a block. The device shared
memory is allocated for storing the sub-matrix data for calculation, and threads share
memory bandwidth which was overtaxed in previous matrix-matrix multiplication lab.
In this lab you will learn:
‧ How to apply tiling on matrix-matrix multiplication.
‧ How to use shared memory on the GPU.
‧ How to apply thread synchronization in a block.
Your output should look like this.
Input matrix file name:
Setup host side environment and launch kernel:
Allocate host memory for matrices M and N.
M: