機器學習編程作業4(Programming Exercise 4: Neural Networks Learning)

 

 

In this exercise, we will implement the backpropagation algorithm for neural
networks and apply it to the task of hand-written digit recognition.

.1 Visualizing the data

the first thing is also display the data on a 2-dimensional plot. The dataset is the same with that we used in the previous exercise.we can just skip directly.

2 Model representation of Neural Networks

the neural network has 25 units in the second layer and 10
output units (corresponding to the 10 digit classes).:

Θ_1∈R25*401,Θ_2∈R10*26,have been provided in ex4weights.mat

3 Feedforward and cost function

First, complete the code in nnCostFunction.m to return the cost.

Recall that the cost function for the neural network (without regulariza-
tion) is:

my code is:

X = [ones(m,1), X];
for i = 1:m
  a1 = X(i,:); %one example
  z2 = a1*Theta1';
  a2 = sigmoid(z2);
  z3 = [1,a2]*Theta2';
  a3 = sigmoid(z3);
  yk = zeros(1,num_labels);
  yk(1,y(i,1)) = 1;
  
  J_l = log(a3)*(-yk)';
  J_r = log(1-a3)*(1-yk)';
  J = J+(J_l - J_r);
endfor
  J = J/m;

4 Regularized cost function

The cost function for neural networks with regularization is:

my code is:

temp1 = Theta1;
  temp1(:,1) = 0;
  sum_1 = sum(sum(temp1.^2));
  
  temp2 = Theta2;
  temp2(:,1) = 0;
  sum_2 = sum(sum(temp2.^2));
  
  reg = (sum_1+sum_2)*lambda/2/m;
  J = J+reg;

5、Backpropagation

You will implement the backpropagation algorithm to compute the gradients for the parameters for the neural network.

Recall that the intuition behind the backpropagation algorithm is as follows. Given a training example (x(t); y(t)), we will rst run a \forward pass" to compute all the activations throughout the network, including the output value of the hypothesis h(x). Then, for each node j in layer l, we would like to compute an "error term"  ^{{\delta _{j}}^{l}} that measures how much that node was "responsible" for any errors in our output.

You should implement steps 1 to 4 in a loop that processes one example at a time.

step1: 

step2: set 

step3: set 

在計算這一步的時候,由於\Theta ^{_{2}} 是10*26維,\delta ^{^{3}}是1*10維,爲了維度匹配,要爲z^{^{}2}增加偏置項1

step4: set 

實用for循環把所有的樣本的\Delta累加起來,使用下面的公式得到梯度:

6、Regularized Neural Networks

after you have computed ^{{\Delta _{ij}}^{l}} using backpropagation, you should add regularization using

X = [ones(m,1), X];
for i = 1:m
  a1 = X(i,:); %one example
  z2 = a1*Theta1';
  a2 = sigmoid(z2);
  z3 = [1,a2]*Theta2';
  a3 = sigmoid(z3);
  yk = zeros(1,num_labels);
  yk(1,y(i,1)) = 1;
  
  J_l = log(a3)*(-yk)';
  J_r = log(1-a3)*(1-yk)';
  J = J+(J_l - J_r);
  
  %backpropagation algorithm
  delta3 = a3 - yk;
  delta2 = (delta3 * Theta2).*sigmoidGradient([1,z2]);
  delta2 = delta2(2:end);
  Theta1_grad = Theta1_grad + delta2'*a1;
  Theta2_grad = Theta2_grad + delta3'*[1,a2];
endfor
  J = J/m;
  
  temp1 = Theta1;
  temp1(:,1) = 0;
  sum_1 = sum(sum(temp1.^2));
  
  temp2 = Theta2;
  temp2(:,1) = 0;
  sum_2 = sum(sum(temp2.^2));
  
  reg = (sum_1+sum_2)*lambda/2/m;
  J = J+reg;
  
  Theta1_grad = Theta1_grad./m;
  Theta1_grad(:, 2:end) = Theta1_grad(:,2:end) +(lambda/m)*Theta1(:,2:end);
  Theta2_grad = Theta2_grad./m;
  Theta2_grad(:, 2:end) = Theta2_grad(:,2:end) +(lambda/m)*Theta2(:,2:end);

7、Sigmoid gradient

The gradient for the sigmoid function can be computed as:

g = sigmoid(z).*(1-sigmoid(z));

8、Random initialization

對於神經網絡來說,如果所有參數都初始化爲0,則意味着第二層所有激活單元都會有相同的值。同理,如果都初始化爲一個非0的參數,結果也是一樣的。因此我們通常初始化參數爲[-\varepsilon ,\varepsilon ],

9、Gradient checking

這一方法的目的是通過估計梯度值來校驗反向傳播法計算的到數字是否真的是我們要求的。

對梯度的估計採用的的方法是在假設代價函數是\Theta的函數J(\Theta ), 在J(\Theta )的切線方向選擇兩個非常近的點計算兩個點的平均值用於估計梯度。

             

 

\Theta時一個向量時,則需要對偏導數進行檢驗。因爲代價函數的偏導數檢驗只針對一個參數的改版進行檢驗:

%創建一個小規模的神經網絡
input_layer_size = 3;
hidden_layer_size = 5;
num_labels = 3;
m = 5;

%隨機初始化網絡參數和X,Y
Theta1 = debugInitializeWeights(hidden_layer_size, input_layer_size);
Theta2 = debugInitializeWeights(num_labels, hidden_layer_size);

X  = debugInitializeWeights(m, input_layer_size - 1);
y  = 1 + mod(1:m, num_labels)';

%展開參數
nn_params = [Theta1(:) ; Theta2(:)];

%計算梯度
[cost, grad] = costFunc(nn_params);

%計算估計梯度數值
numgrad = computeNumericalGradient(costFunc, nn_params);

%評估grad和numgrad的差異
diff = norm(numgrad-grad)/norm(numgrad+grad);

function numgrad = computeNumericalGradient(J, theta)
{
    numgrad = zeros(size(theta));
    perturb = zeros(size(theta));
    e = 1e-4;
    for p = 1:numel(theta)
        % Set perturbation vector
        perturb(p) = e;
        loss1 = J(theta - perturb);
        loss2 = J(theta + perturb);
        % Compute Numerical Gradient
        numgrad(p) = (loss2 - loss1) / (2*e);
        perturb(p) = 0;
    end
}

10、Training NN

現在我們可以訓練神經網絡的參數了

使用隨機初始化的參數矩陣和fmincg函數,可以訓練出申請網絡的參數

嘗試修改maxiter,或者正則化係數\lambda,或者修改隱藏單元的個數

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章