共计 10927 个字符,预计需要花费 28 分钟才能阅读完成。
HOMEWORK ASSIGNMENT 3
(Heriot-Watt MSc Course B31XN)
HOMEWORK ASSIGNMENT 3:
STOCHASTIC OPTIMISATION FOR BINARY SVM
Guidelines
Assessment The assignment will contribute 20% to the final mark for the course. For each task, marks
will be given for correctly using Matlab to carry out the task and for your interpretation and comments
about the results. Each task will have the same weight in the assignment. Additional 2 marks will be
given for respecting the guidelines.
General style The report should be a pdf file. There should be a header page that includes your name
and matriculation number.
Length Your assignment should be NOT more than four sides of A4 including title, graphs, tables,
references, but not including the Appendix. Use a standard font that can be read when printing the report
(be also careful with the font of figure’labels and captions).
Content The main part of your assignment should give the results for each task that you carry out and
your comments on and conclusions from these results. Figures and tables must be included in the main
report.
Appendix The report should not contain Matlab codes, but they must be included in an Appendix. The
codes can be provided as pdf and/or snapshots, and must be properly commented
Submission The assignment should be submitted through Vision. It should not be submitted by email.
Collaboration Students are encouraged to discuss the methods used with other students, but your
submitted assignment must be all your own work. In particular, copying a section of your assignment,
either plots or commentary, from another student is a serious disciplinary offence. It is also an offence to
allow another student to copy your work, so your assignment files should not be made available to other
students.
Deadline The deadline for submission is 26 March 2021; assignments may be submitted early.
Late submissions Any assignment that is submitted after the submission deadline but within 5 working
days of the deadline will be penalised by a 30% reduction in the mark. Assignments that are submitted
more than 5 working days past the deadline will not be marked.
HOMEWORK ASSIGNMENT 3: STOCHASTIC OPTIMISATION FOR BINARY SVM 1/9
Dr Audrey Repetti
HOMEWORK ASSIGNMENT 3
(Heriot-Watt MSc Course B31XN)
Figure 1: Sample of the MNIST dataset (http://yann.lecun.com/exdb/mn…).
Figure 2: Sample of the MNIST dataset showing handwritten digits 0 and 1.
Outline
• Stochastic proximal algorithms
• Machine Learning
• Binary classifier
1 Project description
This project aims to illustrate stochastic proximal methods in the context of machine learning. You will
need to implement a stochastic gradient forward-backward (FB) algorithm with Matlab, to train a binary
support vector machine (SVM) classifier. To help with this task, you will also implement the standard
FB algorithm.
HOMEWORK ASSIGNMENT 3: STOCHASTIC OPTIMISATION FOR BINARY SVM 2/9
Dr Audrey Repetti
HOMEWORK ASSIGNMENT 3
(Heriot-Watt MSc Course B31XN)
Binary SVM
The objective in binary regression is to be able to correctly predict a binary label (e.g. yes vs. no, or 0
vs. 1, etc.). Binary classification aims to pair an input x ∈ R
N to a binary output y ∈ {−1, +1}, via a
model of the form
y = dw(x) = sign
x
w
, (1)
where > denotes the transpose operation, and w ∈ R
N are weights that need to be learned. Precisely,
the supervised learning task consists in finding w such that for a given input, the output will be correctly
predicted by the classifier. The classifier is trained on a training set of pairs
S =(x`
, y)16
6L |(∀∈ {1, . . . , L}) x
∈ R
N , y` ∈ {−1, +1}
. (2)
A sample of the first 21 images from the MNIST test set are shown in Figure 1. The full MNIST
dataset can be found at http://yann.lecun.com/exdb/mn…, it contains 60000 training images of size
28 × 28, and 10000 testing images of size 28 × 28, all representing handwritten digits between 0 and 9.
A binary classifier will aim to provide a binary answer to a question (e.g.“yes”vs.“no”, or 0 vs. 1).
For this data set, the classifier could be trained, e.g., to recognise the digit 0 among the other digits. A
simplified version would be to only look at 0 and 1 digits, and classify the digits depending if they are 0
or 1. A sub-sample of the MNIST test set is given in Figure 2, only showing digits 0 and 1.
Minimisation problem
A classical approach for learning the vector of weights w is to define it as a solution to
minimise
w∈RN
g(w) + h(w), (3)
where g ∈ Γ0(R
N ) is a regularisation function, and h ∈ Γ0(R
N ) is the loss function. In the context of
sparse learning, g can be chosen to be an `1 norm:
(∀w ∈ R
N ) g(w) = λkwk1, (4)
where λ > 0. Regarding h, a classical choice is the Huber loss. Let
(∀w ∈ R
N ) h(w) = 1
L
X
L
`=1
φ(y` − x
` w), (5)
with φ ∈ Γ0(R) is the Huber function defined as
(∀u ∈ R) φ(u) = (
1
2
u
2
if |u| 6 δ,
δ
|u| − 1
2
δ
otherwise.
(6)
Matlab toolbox
The file Main.m contains the main commands to implement the binary classifier.
HOMEWORK ASSIGNMENT 3: STOCHASTIC OPTIMISATION FOR BINARY SVM 3/9
Dr Audrey Repetti
HOMEWORK ASSIGNMENT 3
(Heriot-Watt MSc Course B31XN)
Datasets. In this project, you will handle two datasets, both obtained from the MNIST dataset.
• dataset subMNIST.mat: Dataset that only contains digits 0 and 1.
• dataset MNIST.mat: Full MNIST dataset.
The variables X test and X train are tensors of dimension Nx × Ny × I and Nx × Ny × L, respectively,
where Nx = Ny = 28 and I (resp. L) is the total number of images contained in the testing set (resp.
training set). For instance, the i-th image in X test is given by X test(:,:,i). The variables Y test and
Y train are vectors of dimension L containing labels in {−1, +1}. The datasets have been preprocessed
such that the labels correspond to +1 if the associated the digit is 0, and the labels are −1 if the associated
digit is not 0. For instance, for the dataset dataset subMNIST.mat, the first digit in X test (i.e.
X test(:,:,1)) is 1, so the the first coefficient in Y test (i.e. Y test(1)) is −1. The second digit X test(:,:,2)
is 0, so the the second coefficient Y test(2) is +1.
Binary classifier model. You will aim to train a classifier to determine whether a hand-written digit is
a 0 or not. You will use a linear model of the form (1). This model in Matlab is implemented as follows:
where x is the image to be tested to check whether it is the digit 0 or not, and w is the variable that
needs to be trained.
In order to test the full training set with the learned variable w, the Matlab function d test will be
used, defined as follows:
To determine the percentage of errors done by the classifier, one can compute
E(w) = 100 ×
1
2
PI
i=1 |dw(xi) − yi
|
I
, (7)
where (xi)16i6I are the testing images, and (yi)16i6I are the associated labels (= +1 if the associated
digit is a 0, and = −1 if it is not a 0).
Additional information. Further details are provided below.
• op norm.m: val = op norm(A, At, im size)
– Compute the squared spectral norm of operator A
– A and At: operator and its adjoint. Must be defined as functions.
– im size: size of the input, e.g. if A applies to a variable of size [Nx,Ny], then im size=[Nx,Ny].
• huber.m: h = huber(x, delta)
– Compute the Huber function φ defined in (6).
– x can be either a scalar or a vector.
HOMEWORK ASSIGNMENT 3: STOCHASTIC OPTIMISATION FOR BINARY SVM 4/9
Dr Audrey Repetti
HOMEWORK ASSIGNMENT 3
(Heriot-Watt MSc Course B31XN)
– delta: scalar value corresponding to the parameter δ in (6).
– If x is a vector of size L, then the output of the function is a vector of size L as well.
• huber grad: g=huber grad(x,delta)
– Compute the gradient of the Huber function φ defined in (6).
– x can be either a scalar or a vector.
– delta: scalar value corresponding to the parameter δ in (6).
– If x is a vector of size L, then the output of the function is a vector of size L as well.
– This function must be updated.
• TO BE COMPLETED: The parts of the code to be completed (or changed) are highlighted with
comments, e.g.:
2 Tasks
Before designing a stochastic gradient forward-backward to train a classifier, you will first build
a classical forward-backward algorithm to solve (3) (without stochastic gradient). The questions
below will drive you for this task.
(a) [1 Mark]
Let X = (x)16
6L ∈ R
N×L be the matrix containing in the-th column the input x
. Let
Y = (y)16
6L ∈ R
L.
Show that
(∀w ∈ R
N ) h(w) = 1
L
ϕ(Y − Xw) (8)
where ϕ: R
L → R is defined as
(∀u = (u)16
6L ∈ R
L
) ϕ(u) = X
L
`=1
φ(u`), (9)
with φ defined in (6).
(b) [2 Mark]
What is the gradient of function h? What is the Lipschitz constant β of its gradient?
(c) [1 Mark]
What is the proximity operator of function g?
(d) [1 Mark]
Give the FB iterations to solve (3).
HOMEWORK ASSIGNMENT 3: STOCHASTIC OPTIMISATION FOR BINARY SVM 5/9
Dr Audrey Repetti
HOMEWORK ASSIGNMENT 3
(Heriot-Watt MSc Course B31XN)- [2 Marks]
You will now implement the FB algorithm with MATLAB.
• In file main.m, compute compute the Lipschitz constant of h:
• In file FB.m update the first lines corresponding to the different parameter, functions and
operators involved to run the FB algorithm to solve (3):
– gamma: step-size for FB
– g and h: compute functions g and h given in equations (4) and (5), respectively.
– grad =@(w): gradient of the smooth function, computed at w. To compute this gradient,
you also need to update the file huber grad.m, with function g=huber grad(x,delta).
– prox =@(w, T): proximity operator of the proximable function, computed at w, with
parameter T.
• In file FB.m update the FB steps: - [2 Marks]
Run the FB algorithm to solve problem (3), with both the two MNIST datasets. For each dataset,
comment on the results, and give a figure showing the percentage of error of the classifier obtained
during the training process, as a function of time. - For this task, you will aim to build a stochastic gradient forward-backward (SGFB) algorithm to
solve (3).
HOMEWORK ASSIGNMENT 3: STOCHASTIC OPTIMISATION FOR BINARY SVM 6/9
Dr Audrey Repetti
HOMEWORK ASSIGNMENT 3
(Heriot-Watt MSc Course B31XN)
(a) [1 Mark]
For every k ∈ N, let Lk ⊂ {1, . . . , D}, and Lk = ] Lk. Let XLk = (x)
∈Lk ∈ R
Lk×N and
YLk = (y)
∈Lk ∈ R
Lk .
In the context of SGFB, give the approximated gradient of h, only involving the functions
and operators associated with indices Lk.
(b) [1 Mark]
Give the SGFB iterations to solve problem (3), assuming that, at each iteration k ∈ N, a
subset Dk ⊂ {1, . . . , D} of the D functions is randomly selected.
(c) [1 Mark]
State clearly the convergence conditions. - [2 Marks]
You will now implement the SGFB algorithm with MATLAB.
• In file FB sto.m, update the first lines corresponding to the different parameters, functions,
and operators involved to run the SGFB algorithm to solve (3):
They are all the same as for FB, apart from the computation of the approximated gradient,
that should only involve the functions/operators associated with the selected indices Ind.
• In file FB sto.m update the FB steps:
Additional information:
• In file Main.m, you will need to choose the parameters p and theta:
HOMEWORK ASSIGNMENT 3: STOCHASTIC OPTIMISATION FOR BINARY SVM 7/9
Dr Audrey Repetti
HOMEWORK ASSIGNMENT 3
(Heriot-Watt MSc Course B31XN)
– p: parameter with values in ]0, 1], corresponding to the proportion of indices d ∈
{1, . . . , D} that are randomly selected at each iteration.
– theta: parameter to control convergence speed that must appear in the choice of the
step-size.
Remark: In this task, the main difficulty is to build the approximated gradient, hence finding a
suitable Matlab syntax is part of the question. - [2 Marks]
Run the SGFB algorithm for the MNIST dataset only containing 0 and 1 digits, considering the
following parameters: p ∈ {1‰, 1%, 10%, 50%} and θ ∈ {0.6, 0.8, 1}.
Give tables (see below) with (i) the percentage error when applying the learned classifier to the test
sets, (ii) the total time necessary to train the classifier, and (iii) the final objective value of h + g.
You may want to run the algorithm multiple times to take an average of both percentage error and
convergence time.
Remark: If you prefer, you can provide figures.
Percentage of error
θ \ p 1‰ 1% 10% 50%
0.6
0.8
1
Convergence time
θ \ p 1‰ 1% 10% 50%
0.6
0.8
1
Final objective value
θ \ p 1‰ 1% 10% 50%
0.6
0.8
1 - [2 Marks]
Comment on the Matlab results obtained for the previous questions.
Based on your comments, choose the best values for p and θ and train the classifier with the SGFB
algorithm on the full MNIST dataset. Comment your results.
Remark: The problem being convex, the minimum objective value should be the same for all
methods. It may happen that you observe that it is not the case. This might be due to the considered
stopping criterion.
Keeping this in mind, you can use the results obtained from the classical FB algorithm as ground
truth results.
Do not change the stopping criterion, only acknowledge this fact if you observe it in your simulations.
[Additional 2 Marks] If guidelines are respected.
HOMEWORK ASSIGNMENT 3: STOCHASTIC OPTIMISATION FOR BINARY SVM 8/9
Dr Audrey Repetti
HOMEWORK ASSIGNMENT 3
(Heriot-Watt MSc Course B31XN)
Matlab help
To obtain average values in Task 6, one can use loops in Matlab to automatically change parameters and
run multiple times an algorithm:
HOMEWORK ASSIGNMENT 3: STOCHASTIC OPTIMISATION FOR BINARY SVM 9/9