神經網絡的前向傳播和反向傳播推導
x1和x2表示輸入
wij表示權重
bij表示偏置
σi表示激活函數,這裏使用sigmoid激活函數
out表示輸出
y表示真實值
η表示學習率
前向傳播
h1=w11x1+w13x2+b11,α1=σ(h1)=1+e−h11
h2=w12x1+w14x2+b12,α2=σ(h2)=1+e−h21
z=w21α1+w22α2+b21,out=σ(z)=1+e−z1
損失函數
E=21(out−y)2
反向傳播
求導
△w21=∂w21∂E=∂out∂E∂z∂out∂w21∂z=(out−y)σ(z)(1−σ(z))α1
△w22=∂w22∂E=∂out∂E∂z∂out∂w22∂z=(out−y)σ(z)(1−σ(z))α2
△b21=∂b21∂E=∂out∂E∂z∂out∂b21∂z=(out−y)σ(z)(1−σ(z))
更新w21、w22、b21
w21=w21−η△w21
w22=w22−η△w22
b21=b21−η△b21
求導
△w12=∂out∂E∂z∂out∂α2∂z∂h2∂α2∂h2∂α2∂w12∂h2=(out−y)σ(z)(1−σ(z))w22σ(h2)(1−σ(h2))x1
△w14=∂out∂E∂z∂out∂α2∂z∂h2∂α2∂h2∂α2∂w14∂h2=(out−y)σ(z)(1−σ(z))w22σ(h2)(1−σ(h2))x2
△b12=∂out∂E∂z∂out∂α2∂z∂h2∂α2∂h2∂α2∂b12∂h2=(out−y)σ(z)(1−σ(z))w22σ(h2)(1−σ(h2))
△w11=∂out∂E∂z∂out∂α1∂z∂h1∂α1∂h1∂α1∂w11∂h1=(out−y)σ(z)(1−σ(z))w21σ(h1)(1−σ(h1))x1
△w13=∂out∂E∂z∂out∂α1∂z∂h1∂α1∂h1∂α1∂w13∂h1=(out−y)σ(z)(1−σ(z))w21σ(h1)(1−σ(h1))x2
△b11=∂out∂E∂z∂out∂α1∂z∂h1∂α1∂h1∂α1∂b11∂h1=(out−y)σ(z)(1−σ(z))w21σ(h1)(1−σ(h1))
更新w12、w14、b12
w12=w12−η△w12
w14=w14−η△w14
b12=b12−η△b12
更新w11、w13、b11
w11=w11−η△w11
w13=w13−η△w13
b11=b11−η△b11
import matplotlib.pyplot as plt
import numpy as np
N, D_in, H, D_out = 64, 1000, 100, 10
x = np.random.randn(D_in, N)
y = np.random.randn(D_out, N)
w1 = np.random.randn(D_in, H)
b1 = np.zeros((H, N))
w2 = np.random.randn(H, D_out)
b2 = np.zeros((D_out, N))
learning_rate = 1e-6
loss_list = []
iter = 500
for i in range(iter):
h = np.matmul(w1.T, x)+b1
a = np.maximum(h, 0)
y_pred = np.matmul(w2.T, a)+b2
loss = np.square(y_pred-y).sum()
loss_list.append(loss)
grad_y_pred = 2*(y_pred-y)
grad_w2 = np.matmul(a, grad_y_pred.T)
grad_b2 = grad_y_pred
grad_a = np.matmul(w2, grad_y_pred)
grad_a[a<0] = 0
grad_w1 = np.matmul(x, grad_a.T)
grad_b1 = grad_a
w1 -= learning_rate*grad_w1
b1 -= learning_rate*grad_b1
w2 -= learning_rate*grad_w2
b2 -= learning_rate*grad_b2
plt.plot(range(iter), loss_list)
plt.ylabel('loss')
plt.xlabel('iter')
plt.show()