機器學習編程作業4(Programming Exercise 4: Neural Networks Learning)



In this exercise, we will implement the backpropagation algorithm for neural
networks and apply it to the task of hand-written digit recognition.

.1 Visualizing the data

the first thing is also display the data on a 2-dimensional plot. The dataset is the same with that we used in the previous exercise.we can just skip directly.

2 Model representation of Neural Networks

the neural network has 25 units in the second layer and 10
output units (corresponding to the 10 digit classes).:

Θ_1∈R25*401,Θ_2∈R10*26,have been provided in ex4weights.mat

3 Feedforward and cost function

First, complete the code in nnCostFunction.m to return the cost.

Recall that the cost function for the neural network (without regulariza-
tion) is:

my code is:

X = [ones(m,1), X];
for i = 1:m
  a1 = X(i,:); %one example
  z2 = a1*Theta1';
  a2 = sigmoid(z2);
  z3 = [1,a2]*Theta2';
  a3 = sigmoid(z3);
  yk = zeros(1,num_labels);
  yk(1,y(i,1)) = 1;
  J_l = log(a3)*(-yk)';
  J_r = log(1-a3)*(1-yk)';
  J = J+(J_l - J_r);
  J = J/m;

4 Regularized cost function

The cost function for neural networks with regularization is:

my code is:

temp1 = Theta1;
  temp1(:,1) = 0;
  sum_1 = sum(sum(temp1.^2));
  temp2 = Theta2;
  temp2(:,1) = 0;
  sum_2 = sum(sum(temp2.^2));
  reg = (sum_1+sum_2)*lambda/2/m;
  J = J+reg;


You will implement the backpropagation algorithm to compute the gradients for the parameters for the neural network.

Recall that the intuition behind the backpropagation algorithm is as follows. Given a training example (x(t); y(t)), we will rst run a \forward pass" to compute all the activations throughout the network, including the output value of the hypothesis h(x). Then, for each node j in layer l, we would like to compute an "error term"  ^{{\delta _{j}}^{l}} that measures how much that node was "responsible" for any errors in our output.

You should implement steps 1 to 4 in a loop that processes one example at a time.


step2: set 

step3: set 

在計算這一步的時候,由於\Theta ^{_{2}} 是10*26維,\delta ^{^{3}}是1*10維,爲了維度匹配,要爲z^{^{}2}增加偏置項1

step4: set 


6、Regularized Neural Networks

after you have computed ^{{\Delta _{ij}}^{l}} using backpropagation, you should add regularization using

X = [ones(m,1), X];
for i = 1:m
  a1 = X(i,:); %one example
  z2 = a1*Theta1';
  a2 = sigmoid(z2);
  z3 = [1,a2]*Theta2';
  a3 = sigmoid(z3);
  yk = zeros(1,num_labels);
  yk(1,y(i,1)) = 1;
  J_l = log(a3)*(-yk)';
  J_r = log(1-a3)*(1-yk)';
  J = J+(J_l - J_r);
  %backpropagation algorithm
  delta3 = a3 - yk;
  delta2 = (delta3 * Theta2).*sigmoidGradient([1,z2]);
  delta2 = delta2(2:end);
  Theta1_grad = Theta1_grad + delta2'*a1;
  Theta2_grad = Theta2_grad + delta3'*[1,a2];
  J = J/m;
  temp1 = Theta1;
  temp1(:,1) = 0;
  sum_1 = sum(sum(temp1.^2));
  temp2 = Theta2;
  temp2(:,1) = 0;
  sum_2 = sum(sum(temp2.^2));
  reg = (sum_1+sum_2)*lambda/2/m;
  J = J+reg;
  Theta1_grad = Theta1_grad./m;
  Theta1_grad(:, 2:end) = Theta1_grad(:,2:end) +(lambda/m)*Theta1(:,2:end);
  Theta2_grad = Theta2_grad./m;
  Theta2_grad(:, 2:end) = Theta2_grad(:,2:end) +(lambda/m)*Theta2(:,2:end);

7、Sigmoid gradient

The gradient for the sigmoid function can be computed as:

g = sigmoid(z).*(1-sigmoid(z));

8、Random initialization

對於神經網絡來說,如果所有參數都初始化爲0,則意味着第二層所有激活單元都會有相同的值。同理,如果都初始化爲一個非0的參數,結果也是一樣的。因此我們通常初始化參數爲[-\varepsilon ,\varepsilon ],

9、Gradient checking


對梯度的估計採用的的方法是在假設代價函數是\Theta的函數J(\Theta ), 在J(\Theta )的切線方向選擇兩個非常近的點計算兩個點的平均值用於估計梯度。




input_layer_size = 3;
hidden_layer_size = 5;
num_labels = 3;
m = 5;

Theta1 = debugInitializeWeights(hidden_layer_size, input_layer_size);
Theta2 = debugInitializeWeights(num_labels, hidden_layer_size);

X  = debugInitializeWeights(m, input_layer_size - 1);
y  = 1 + mod(1:m, num_labels)';

nn_params = [Theta1(:) ; Theta2(:)];

[cost, grad] = costFunc(nn_params);

numgrad = computeNumericalGradient(costFunc, nn_params);

diff = norm(numgrad-grad)/norm(numgrad+grad);

function numgrad = computeNumericalGradient(J, theta)
    numgrad = zeros(size(theta));
    perturb = zeros(size(theta));
    e = 1e-4;
    for p = 1:numel(theta)
        % Set perturbation vector
        perturb(p) = e;
        loss1 = J(theta - perturb);
        loss2 = J(theta + perturb);
        % Compute Numerical Gradient
        numgrad(p) = (loss2 - loss1) / (2*e);
        perturb(p) = 0;

10、Training NN





還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.