共计 7206 个字符,预计需要花费 19 分钟才能阅读完成。
CMSC 421 Assignment OneNeural Networks and OptimizationSeptember 12, 2023General Instructions. Please submit TWO (2) files to ELMS:
(1) a PDF file that is the report of your experimental results and answers to the questions.
(2) a codebase submission in form of a zip file including only the code folders/files you modified andthe Questions folder. Please do not submit the Data Folder we provided. The code should containyour implementations of the experiments and code for producingvisualizations of the results.The project is due at 11:59 pmon September 26 (Monday), 2023.Please read through this document before starting your implementation and experiments. Your score
will be mostly dependent on the completion of experiments, theeffectiveness of the reported results,visualizations, the consistency between the experimental results and analysis, and the clarity of thereport. Neatness and clarity count! Good visualization helps!As you would need to use pytorch for the second half of the programming assignment ConvolutionalNeural Networks – 15 Points, We have included links to some tutorials and documentations to helpyou get started with pytorch:
• Official Pytorch Documentation
• Quickstart Guide
• Tensors
• Data Loading
• Building models in Pytorch
Implementation DetailsFor each problem, you’ll need to code both the training and application phases of the neural network.During training, you’ll adjust the network’s weights and biases using gradient descent. Use a singleparameter, η, to control the step size during gradient descent. The updated weights and biases will becalculated as the old values minus the gradient multiplied by thestep size.We will be providing code snippets and datasets for some parts of the assignment. You will be requiredto read the comments in the code file and fill in the missing pieces in the code files to correctly executethese files. Please ensure that you are read through all the code files we provide. These will be availablein the CMSC421 – Fall2023 GitHub repository.Part 1: Programming Task – (50 Points)
1. Objective
The goal of this assignment is to build a neural network from scratch, focusing on implementing thebackpropagation algorithm. You’ll apply your neural network to simple, synthetic datasets to gainhands-on experience in tuning network parameters.Language andLibrariesPython is mandatory for this assignment. Use numpy for alllinear algebra operations. Do not usemachine learninglibrarieslikePyTorch or TensorFlow for Questions 1, & 3; onlynumpy,matplotlib,and Python built-in libraries are permitted.
1 Simple Linear Regression Model – (10 Points)
1.1 Network Architecture
The network consists of an input layer, a hidden layer with oneunit, a bias layer, and an outputlayer with one unit.• The output is a linear combination of the input, represented as a1 = Xw0 + a0 +b1.
1.2 Loss FunctionUse a regression loss for training, defined as
1.3 ImplementationUsing the template_for_solitions file, write code to train this network and apply it to data on both1D data as q1_a and on higher dimensional data as q1_b.• Data Preparation: Use theq1_<a/b> function from the Data.generator module to generatetraining and testing data. The data module has both a and bso usethe appropriate functioncall to fetch the right data for each experiment.
• Network Setup: Use the net_setup method in the Trainer class to initialize the network, losslayer, and optimizer.
• Training: Use the train method in the Trainer class to train the network. Plot the trainingloss over iterations.
• Testing: Use the test data to evaluate the model’s performance. Plot the actual vs. predictedvalues and compute evaluation metrics.Tests and Experiments
1.4 Hyperparameters
• The main hyperparameters are the step size (η) and the number of gradient descent iterations.
• You may also have implicit hyperparameters like weight and bias initialization.Hyperparameter TuningDiscuss the difficulty level in finding an appropriate set of hyperparameters.
2 A Shallow Network
The goal of this assignment is to implement a fully connected neural network with a single hiddenlayer and a ReLU (Rectified Linear Unit) activation function. The network should be flexible enoughtoaccommodate any number of units in the hidden layer and any size of input, while having just oneoutput unit.
2.1 Network Architecture
The network consists of an input layer, a hidden layer with one unit, a bias layer, and an output layerwith one unit.
2.2 Loss Function
Continue to use a regression loss for training the network. You can continue to use a regression lossin training the network defined as
2.3 Implementation
Using the template_for_solitions file, write code to train thisnetwork and apply it to data on both1D data as q2_a.py and on higher dimensional data as q2_b.py.
3 General Deep Learning
The goal of this section of the assignment is to write your neural network to handle fully-connectednetworks of arbitrary depth. It will be just like the network in Problem 2, but with more layers. Eachlayer will use a ReLU activation function, except for the finallayer.Tests and Experiments
• Test your network with the same training data that you used in Problem 2 A Shallow Network - (10 Points), using both 1D and higher dimensional data. Experiment with using 3 and 5 hidden layers. Evaluate the accuracy of your solutions in the same way as Problem 2 A Shallow Network - (10 Points).
• Conduct and report on experiments to determine whether the depth of a network has any significant effect on how quickly your network can converge to a good solution. Include at least oneplot to justify your conclusions. Again ensure your files are saved as q3_a.py and q3_b.py.EXTRA CREDIT (EC): - Cross Entropy Loss (10 Points) Modify your network GeneralDeep Learning - (15 Points) in to perform classification tasks using a cross-entropy loss and a logisticactivation function in the output layer.If you are submitting the EC save the code files as qec_a.py and qec_b.py.
3.1 Network Architecture
Note on Numerical StabilityBe cautious when exponentiating numbers in the sigmoid function to avoid overflow. Utilize np.maximumand np.minimum for a concise implementation.Tests and Experiments
4 Convolutional Neural Networks
In this Section, you are required to implement a Convolutional Neural Network (CNN) using PyTorchto classify images from the CINIC-10 dataset provided.RequirementsYour CNN model should meet the following criteria:
(A) Utilize dropout for regularization. Mathematically, dropout sets a fraction p of the input units to 0 at each update during training time, which helps to prevent overfitting.
(B) Be trained using either the RMSprop and ADAM optimizer separately. The update rule forwhere θ are the parameters, η is the learning rate, vt is the moving average of the squared
(C) Include at least 3 convolutional layers and 2 fully connected layers. The convolution operationcan be represented as:
(D) Use wandb for visualization of the training loss L, which could be the cross-entropy loss forclassification:
5. Convolutional to Multi-layer Perceptron
A convolution operation is a linear operation, and therefore convolutional layers can be representedin the form of matrix multiplication, or in other words, represented by multi-layerperceptron. Moreprecisely, if we denote the convolution operation as c(x, θw, θb, γ), where θw are the filter weights, θbare the filter biases, and γ are the padding and stride parameters, we want to convert the filters to aweight matrix so thatflatten(c(x, θw, θb, γ)) = Wflatten(x) + b, (1)where flatten(·) takes in a tensor of size (d1, d2, d3) and outputs a 1-D vector of size (d1×d2×d3). For example, flatten(F ilter1) = (i1,1, i1,2, i1,3, i2,1, i2,2, i2,3, i3,1, i3,2, i3,3, j1,1, j1,2, j1,3, j2,1, j2,2, j2,3, j3,1, j3,2, j3,3)The converted weights and biases W and b depend on the convolution filters θw, θb and also γ (paddingsa dot product of a weight vector and the flattened input image, where non-zero entries of thethe weight vector should have exactly the same values as the filter, and their positions dependon the sliding window. When we get the weight vector for each sliding window, we can simplystack them together to get the converted weight matrix W. The bias part is simple, as for onefilter, we are adding the same bias to every sliding window output. Write out the weight matrixW and bias b in terms of the filter weights and biases. Convince yourself that you get exactlythe same output (flattened) as the original convolution.