神經網絡的前向傳播和反向傳播推導

神經網絡的前向傳播和反向傳播推導

在這裏插入圖片描述
x1x_{1}x2x_{2}表示輸入
wijw_{ij}表示權重
bijb_{ij}表示偏置
σi\sigma_{i}表示激活函數,這裏使用sigmoid激活函數
outout表示輸出
yy表示真實值
η\eta表示學習率

前向傳播
h1=w11x1+w13x2+b11h_{1}=w_{11}x_{1}+w_{13}x_{2}+b_{11}α1=σ(h1)=11+eh1\alpha_{1}=\sigma(h1)=\frac{1}{1+e^{-h1}}

h2=w12x1+w14x2+b12h_{2}=w_{12}x_{1}+w_{14}x_{2}+b_{12}α2=σ(h2)=11+eh2\alpha_{2}=\sigma(h2)=\frac{1}{1+e^{-h2}}

z=w21α1+w22α2+b21z=w_{21}\alpha_{1}+w_{22}\alpha_{2}+b_{21}out=σ(z)=11+ezout=\sigma(z)=\frac{1}{1+e^{-z}}

損失函數

E=12(outy)2E=\frac{1}{2}(out-y)^2

反向傳播
求導
w21=Ew21=Eoutoutzzw21=(outy)σ(z)(1σ(z))α1\bigtriangleup w_{21}=\frac{\partial E}{\partial w_{21}}=\frac{\partial E}{\partial out}\frac{{\partial out}}{\partial z}\frac{\partial z}{\partial w_{21}}=(out-y)\sigma(z)(1-\sigma(z))\alpha_{1}

w22=Ew22=Eoutoutzzw22=(outy)σ(z)(1σ(z))α2\bigtriangleup w_{22}=\frac{\partial E}{\partial w_{22}}=\frac{\partial E}{\partial out}\frac{{\partial out}}{\partial z}\frac{\partial z}{\partial w_{22}}=(out-y)\sigma(z)(1-\sigma(z))\alpha_{2}

b21=Eb21=Eoutoutzzb21=(outy)σ(z)(1σ(z))\bigtriangleup b_{21}=\frac{\partial E}{\partial b_{21}}=\frac{\partial E}{\partial out}\frac{{\partial out}}{\partial z}\frac{\partial z}{\partial b_{21}}=(out-y)\sigma(z)(1-\sigma(z))

更新w21w22b21w_{21}、w_{22}、b_{21}

w21=w21ηw21w_{21}=w_{21}-\eta \bigtriangleup w_{21}

w22=w22ηw22w_{22}=w_{22}-\eta \bigtriangleup w_{22}

b21=b21ηb21b_{21}=b_{21}-\eta \bigtriangleup b_{21}

求導

w12=Eoutoutzzα2α2h2α2h2h2w12=(outy)σ(z)(1σ(z))w22σ(h2)(1σ(h2))x1\bigtriangleup w_{12}=\frac{\partial E}{\partial out}\frac{\partial out}{\partial z}\frac{\partial z}{\partial \alpha_{2}}\frac{\partial \alpha_{2}}{\partial h2}\frac{\partial \alpha_{2}}{\partial h_{2}}\frac{{\partial h_{2}}}{\partial w_{12}} =(out-y)\sigma(z)(1-\sigma(z))w_{22}\sigma(h_{2})(1-\sigma(h_{2}))x_{1}

w14=Eoutoutzzα2α2h2α2h2h2w14=(outy)σ(z)(1σ(z))w22σ(h2)(1σ(h2))x2\bigtriangleup w_{14}=\frac{\partial E}{\partial out}\frac{\partial out}{\partial z}\frac{\partial z}{\partial \alpha_{2}}\frac{\partial \alpha_{2}}{\partial h2}\frac{\partial \alpha_{2}}{\partial h_{2}}\frac{{\partial h_{2}}}{\partial w_{14}} =(out-y)\sigma(z)(1-\sigma(z))w_{22}\sigma(h_{2})(1-\sigma(h_{2}))x_{2}

b12=Eoutoutzzα2α2h2α2h2h2b12=(outy)σ(z)(1σ(z))w22σ(h2)(1σ(h2))\bigtriangleup b_{12}=\frac{\partial E}{\partial out}\frac{\partial out}{\partial z}\frac{\partial z}{\partial \alpha_{2}}\frac{\partial \alpha_{2}}{\partial h2}\frac{\partial \alpha_{2}}{\partial h_{2}}\frac{{\partial h_{2}}}{\partial b_{12}} =(out-y)\sigma(z)(1-\sigma(z))w_{22}\sigma(h_{2})(1-\sigma(h_{2}))

w11=Eoutoutzzα1α1h1α1h1h1w11=(outy)σ(z)(1σ(z))w21σ(h1)(1σ(h1))x1\bigtriangleup w_{11}=\frac{\partial E}{\partial out}\frac{\partial out}{\partial z}\frac{\partial z}{\partial \alpha_{1}}\frac{\partial \alpha_{1}}{\partial h1}\frac{\partial \alpha_{1}}{\partial h_{1}}\frac{{\partial h_{1}}}{\partial w_{11}}=(out-y)\sigma(z)(1-\sigma(z))w_{21}\sigma(h_{1})(1-\sigma(h_{1}))x_{1}

w13=Eoutoutzzα1α1h1α1h1h1w13=(outy)σ(z)(1σ(z))w21σ(h1)(1σ(h1))x2\bigtriangleup w_{13}=\frac{\partial E}{\partial out}\frac{\partial out}{\partial z}\frac{\partial z}{\partial \alpha_{1}}\frac{\partial \alpha_{1}}{\partial h1}\frac{\partial \alpha_{1}}{\partial h_{1}}\frac{{\partial h_{1}}}{\partial w_{13}}=(out-y)\sigma(z)(1-\sigma(z))w_{21}\sigma(h_{1})(1-\sigma(h_{1}))x_{2}

b11=Eoutoutzzα1α1h1α1h1h1b11=(outy)σ(z)(1σ(z))w21σ(h1)(1σ(h1))\bigtriangleup b_{11}=\frac{\partial E}{\partial out}\frac{\partial out}{\partial z}\frac{\partial z}{\partial \alpha_{1}}\frac{\partial \alpha_{1}}{\partial h1}\frac{\partial \alpha_{1}}{\partial h_{1}}\frac{{\partial h_{1}}}{\partial b_{11}}=(out-y)\sigma(z)(1-\sigma(z))w_{21}\sigma(h_{1})(1-\sigma(h_{1}))

更新w12w14b12w_{12}、w_{14}、b_{12}

w12=w12ηw12w_{12}=w_{12}-\eta \bigtriangleup w_{12}

w14=w14ηw14w_{14}=w_{14}-\eta \bigtriangleup w_{14}

b12=b12ηb12b_{12}=b_{12}-\eta \bigtriangleup b_{12}

更新w11w13b11w_{11}、w_{13}、b_{11}

w11=w11ηw11w_{11}=w_{11}-\eta \bigtriangleup w_{11}

w13=w13ηw13w_{13}=w_{13}-\eta \bigtriangleup w_{13}

b11=b11ηb11b_{11}=b_{11}-\eta \bigtriangleup b_{11}

import matplotlib.pyplot as plt
import numpy as np

# 定義參數
# N:樣本數量
# D_in:數據維度、輸入維度
# H:隱藏層神經元個數
# D_out:輸出維度
N, D_in, H, D_out = 64, 1000, 100, 10

# 生成數據
x = np.random.randn(D_in, N)
y = np.random.randn(D_out, N)

# 初始化參數
w1 = np.random.randn(D_in, H)
b1 = np.zeros((H, N))
w2 = np.random.randn(H, D_out)
b2 = np.zeros((D_out, N))

# 學習率
learning_rate = 1e-6

loss_list = []

# 最大跌打次數
iter = 500

for i in range(iter):
    # 前向傳播
    h = np.matmul(w1.T, x)+b1 # (100, 64)
    a = np.maximum(h, 0) # (100, 64) relu激活函數
    y_pred = np.matmul(w2.T, a)+b2 # (10, 64)
    
    # 損失函數
    loss = np.square(y_pred-y).sum()
    
    loss_list.append(loss)
    
    # 反向傳播
    grad_y_pred = 2*(y_pred-y) # (10, 64)
    grad_w2 = np.matmul(a, grad_y_pred.T) # (100, 10)
    grad_b2 = grad_y_pred # (10, 64)
    grad_a = np.matmul(w2, grad_y_pred) # (100, 64)
    grad_a[a<0] = 0
    grad_w1 = np.matmul(x, grad_a.T) # (1000, 100)
    grad_b1 = grad_a # (100, 64)
    
    # 更新參數
    w1 -= learning_rate*grad_w1
    b1 -= learning_rate*grad_b1
    w2 -= learning_rate*grad_w2
    b2 -= learning_rate*grad_b2
    
plt.plot(range(iter), loss_list)
plt.ylabel('loss')
plt.xlabel('iter')
plt.show()

在這裏插入圖片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章