In this exercise, we will implement the backpropagation algorithm for neural
networks and apply it to the task of hand-written digit recognition.
.1 Visualizing the data
the first thing is also display the data on a 2-dimensional plot. The dataset is the same with that we used in the previous exercise.we can just skip directly.
2 Model representation of Neural Networks
the neural network has 25 units in the second layer and 10
output units (corresponding to the 10 digit classes).:
Θ_1∈R25*401,Θ_2∈R10*26,have been provided in ex4weights.mat
3 Feedforward and cost function
First, complete the code in nnCostFunction.m to return the cost.
Recall that the cost function for the neural network (without regulariza-
tion) is:
my code is:
X = [ones(m,1), X];
for i = 1:m
a1 = X(i,:); %one example
z2 = a1*Theta1';
a2 = sigmoid(z2);
z3 = [1,a2]*Theta2';
a3 = sigmoid(z3);
yk = zeros(1,num_labels);
yk(1,y(i,1)) = 1;
J_l = log(a3)*(-yk)';
J_r = log(1-a3)*(1-yk)';
J = J+(J_l - J_r);
endfor
J = J/m;
4 Regularized cost function
The cost function for neural networks with regularization is:
my code is:
temp1 = Theta1;
temp1(:,1) = 0;
sum_1 = sum(sum(temp1.^2));
temp2 = Theta2;
temp2(:,1) = 0;
sum_2 = sum(sum(temp2.^2));
reg = (sum_1+sum_2)*lambda/2/m;
J = J+reg;
5、Backpropagation
You will implement the backpropagation algorithm to compute the gradients for the parameters for the neural network.
Recall that the intuition behind the backpropagation algorithm is as follows. Given a training example (x(t); y(t)), we will rst run a \forward pass" to compute all the activations throughout the network, including the output value of the hypothesis h(x). Then, for each node j in layer l, we would like to compute an "error term" that measures how much that node was "responsible" for any errors in our output.
You should implement steps 1 to 4 in a loop that processes one example at a time.
step1:
step2: set
step3: set
在計算這一步的時候,由於 是10*26維,是1*10維,爲了維度匹配,要爲增加偏置項1
step4: set
實用for循環把所有的樣本的累加起來,使用下面的公式得到梯度:
6、Regularized Neural Networks
after you have computed using backpropagation, you should add regularization using
X = [ones(m,1), X];
for i = 1:m
a1 = X(i,:); %one example
z2 = a1*Theta1';
a2 = sigmoid(z2);
z3 = [1,a2]*Theta2';
a3 = sigmoid(z3);
yk = zeros(1,num_labels);
yk(1,y(i,1)) = 1;
J_l = log(a3)*(-yk)';
J_r = log(1-a3)*(1-yk)';
J = J+(J_l - J_r);
%backpropagation algorithm
delta3 = a3 - yk;
delta2 = (delta3 * Theta2).*sigmoidGradient([1,z2]);
delta2 = delta2(2:end);
Theta1_grad = Theta1_grad + delta2'*a1;
Theta2_grad = Theta2_grad + delta3'*[1,a2];
endfor
J = J/m;
temp1 = Theta1;
temp1(:,1) = 0;
sum_1 = sum(sum(temp1.^2));
temp2 = Theta2;
temp2(:,1) = 0;
sum_2 = sum(sum(temp2.^2));
reg = (sum_1+sum_2)*lambda/2/m;
J = J+reg;
Theta1_grad = Theta1_grad./m;
Theta1_grad(:, 2:end) = Theta1_grad(:,2:end) +(lambda/m)*Theta1(:,2:end);
Theta2_grad = Theta2_grad./m;
Theta2_grad(:, 2:end) = Theta2_grad(:,2:end) +(lambda/m)*Theta2(:,2:end);
7、Sigmoid gradient
The gradient for the sigmoid function can be computed as:
g = sigmoid(z).*(1-sigmoid(z));
8、Random initialization
對於神經網絡來說,如果所有參數都初始化爲0,則意味着第二層所有激活單元都會有相同的值。同理,如果都初始化爲一個非0的參數,結果也是一樣的。因此我們通常初始化參數爲,
9、Gradient checking
這一方法的目的是通過估計梯度值來校驗反向傳播法計算的到數字是否真的是我們要求的。
對梯度的估計採用的的方法是在假設代價函數是的函數, 在的切線方向選擇兩個非常近的點計算兩個點的平均值用於估計梯度。
當時一個向量時,則需要對偏導數進行檢驗。因爲代價函數的偏導數檢驗只針對一個參數的改版進行檢驗:
%創建一個小規模的神經網絡
input_layer_size = 3;
hidden_layer_size = 5;
num_labels = 3;
m = 5;
%隨機初始化網絡參數和X,Y
Theta1 = debugInitializeWeights(hidden_layer_size, input_layer_size);
Theta2 = debugInitializeWeights(num_labels, hidden_layer_size);
X = debugInitializeWeights(m, input_layer_size - 1);
y = 1 + mod(1:m, num_labels)';
%展開參數
nn_params = [Theta1(:) ; Theta2(:)];
%計算梯度
[cost, grad] = costFunc(nn_params);
%計算估計梯度數值
numgrad = computeNumericalGradient(costFunc, nn_params);
%評估grad和numgrad的差異
diff = norm(numgrad-grad)/norm(numgrad+grad);
function numgrad = computeNumericalGradient(J, theta)
{
numgrad = zeros(size(theta));
perturb = zeros(size(theta));
e = 1e-4;
for p = 1:numel(theta)
% Set perturbation vector
perturb(p) = e;
loss1 = J(theta - perturb);
loss2 = J(theta + perturb);
% Compute Numerical Gradient
numgrad(p) = (loss2 - loss1) / (2*e);
perturb(p) = 0;
end
}
10、Training NN
現在我們可以訓練神經網絡的參數了
使用隨機初始化的參數矩陣和fmincg函數,可以訓練出申請網絡的參數
嘗試修改maxiter,或者正則化係數,或者修改隱藏單元的個數