正向傳播、反向傳播推導總結

最近在看《動手學深度學習》這本書,收穫很大,在此想總結一下,方便日後回顧。也推薦大家去看看這本書。

一、正向傳播:(這裏不考慮偏差項b)

輸入層到隱藏層中間變量:

z=W^{(1)}x

隱藏層:

h=\phi(z)

輸出層:

o=W^{(2)}h

損失項:

L=\ell (o, y)

L_{2}正則化項:

s=\frac{\lambda }{2}(\parallel W^{(1)}\parallel _{F}^{2}+\parallel W^{(2)}\parallel _{F}^{2})

目標函數:

J=L+s

正向傳播計算圖:

二、反向傳播:

首先確定目標,求:

\frac{\partial J}{\partial W^{(1)}}  ,  \frac{\partial J}{\partial W^{(2)}}

先計算最靠近輸出層模型參數的梯度:

\begin{aligned} \frac{\partial J}{\partial W^{(2)}}& =prod(\frac{\partial J}{\partial o},\frac{\partial o}{\partial W^{(2)}})+prod(\frac{\partial J}{\partial s},\frac{\partial s}{\partial W^{(2)}}) \\&=\frac{\partial J}{\partial o}h^{T}+\lambda W^{(2)}\\& =prod(\frac{\partial J}{\partial L},\frac{\partial L}{\partial o})h^{T}+\lambda W^{(2)}\\& =\frac{\partial L}{\partial o}h^{T}+\lambda W^{(2)} \end{aligned}

再計算靠近輸入層模型參數的梯度:

\begin{aligned} \frac{\partial J}{\partial W^{(1)}}& =prod(\frac{\partial J}{\partial z},\frac{\partial z}{\partial W^{(1)}})+prod(\frac{\partial J}{\partial s},\frac{\partial s}{\partial W^{(1)}})\\& =\frac{\partial J}{\partial z}x^{T}+\lambda W^{(1)}\\& =prod(\frac{\partial J}{\partial h},\frac{\partial h}{\partial z})x^{T}+\lambda W^{(1)}\\& =\frac{\partial J}{\partial h}\odot {\phi }'(z)x^{T}+\lambda W^{(1)}\\& =prod(\frac{\partial J}{\partial o},\frac{\partial o}{\partial h})\odot {\phi }'(z)x^{T}+\lambda W^{(1)}\\& =W^{(2)^{T}}\frac{\partial J}{\partial o}\odot {\phi }'(z)x^{T}+\lambda W^{(1)}\\& =W^{(2)^{T}}prod(\frac{\partial J}{\partial L},\frac{\partial L}{\partial o})\odot {\phi }'(z)x^{T}+\lambda W^{(1)}\\& =W^{(2)^{T}}\frac{\partial L}{\partial o}\odot {\phi }'(z)x^{T}+\lambda W^{(1)} \end{aligned}

 其中:\frac{\partial J}{\partial L}=1,\frac{\partial J}{\partial s}=1,\frac{\partial s}{\partial W^{(1)}}=\lambda W^{(1)},\frac{\partial s}{\partial W^{(2)}}=\lambda W^{(2)}

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章