正向傳播推導
第i個樣本
我們要搭建的神經網絡模型如下圖:
維度
這是一個2層神經網絡模型,第0層爲輸入層,有n_x個特徵;第1層爲隱藏層,有n_h=4個隱藏單元;第2層爲輸出層,有n_y=1個輸出單元。右上角[ 0 ] ( i ) ^{[0](i)} [ 0 ] ( i ) 符號代表第0層第i個樣本。令a [ 0 ] ( i ) = x [ 0 ] ( i ) a^{[0](i)}=x^{[0](i)} a [ 0 ] ( i ) = x [ 0 ] ( i ) 。x [ 0 ] ( i ) x^{[0](i)} x [ 0 ] ( i ) 維度:2x1;w [ 1 ] ( i ) w^{[1](i)} w [ 1 ] ( i ) 維度=n_h x n_x=4x2;b [ 1 ] ( i ) b^{[1](i)} b [ 1 ] ( i ) 維度=n_h x 1=4x1;w [ 2 ] ( i ) w^{[2](i)} w [ 2 ] ( i ) 維度=n_y x n_h=1x4;b [ 2 ] ( i ) b^{[2](i)} b [ 2 ] ( i ) 維度=n_y x 1=1x1。
求z [ 1 ] ( i ) 、 a [ 1 ] ( i ) 、 z [ 2 ] ( i ) 、 a [ 2 ] ( i ) z^{[1](i)}、a^{[1](i)}、z^{[2](i)}、a^{[2](i)} z [ 1 ] ( i ) 、 a [ 1 ] ( i ) 、 z [ 2 ] ( i ) 、 a [ 2 ] ( i )
w 1 [ 1 ] ( i ) = ( w 11 [ 1 ] ( i ) w 12 [ 1 ] ( i ) ) ; w 2 [ 1 ] ( i ) = ( w 21 [ 1 ] ( i ) w 22 [ 1 ] ( i ) ) ; w 3 [ 1 ] ( i ) = ( w 31 [ 1 ] ( i ) w 32 [ 1 ] ( i ) ) ; w 4 [ 1 ] ( i ) = ( w 41 [ 1 ] ( i ) w 42 [ 1 ] ( i ) ) w_1^{[1](i)}=\begin{pmatrix} w_{11}^{[1](i)} \\ w_{12}^{[1](i)} \\ \end{pmatrix};w_2^{[1](i)}=\begin{pmatrix} w_{21}^{[1](i)} \\ w_{22}^{[1](i)} \\ \end{pmatrix};w_3^{[1](i)}=\begin{pmatrix} w_{31}^{[1](i)} \\ w_{32}^{[1](i)} \\ \end{pmatrix};w_4^{[1](i)}=\begin{pmatrix} w_{41}^{[1](i)} \\ w_{42}^{[1](i)} \\ \end{pmatrix} w 1 [ 1 ] ( i ) = ( w 1 1 [ 1 ] ( i ) w 1 2 [ 1 ] ( i ) ) ; w 2 [ 1 ] ( i ) = ( w 2 1 [ 1 ] ( i ) w 2 2 [ 1 ] ( i ) ) ; w 3 [ 1 ] ( i ) = ( w 3 1 [ 1 ] ( i ) w 3 2 [ 1 ] ( i ) ) ; w 4 [ 1 ] ( i ) = ( w 4 1 [ 1 ] ( i ) w 4 2 [ 1 ] ( i ) )
w [ 1 ] ( i ) = ( w 11 [ 1 ] ( i ) w 12 [ 1 ] ( i ) w 21 [ 1 ] ( i ) w 22 [ 1 ] ( i ) w 31 [ 1 ] ( i ) w 32 [ 1 ] ( i ) w 41 [ 1 ] ( i ) w 42 [ 1 ] ( i ) ) = ( w 1 [ 1 ] ( i ) T w 2 [ 1 ] ( i ) T w 3 [ 1 ] ( i ) T w 4 [ 1 ] ( i ) T ) w^{[1](i)}=\begin{pmatrix} w_{11}^{[1](i)} & w_{12}^{[1](i)}\\ w_{21}^{[1](i)} & w_{22}^{[1](i)} \\ w_{31}^{[1](i)} & w_{32}^{[1](i)} \\ w_{41}^{[1](i)} & w_{42}^{[1](i)} \\ \end{pmatrix}=\begin{pmatrix} w_1^{[1](i)T}\\ w_2^{[1](i)T}\\ w_3^{[1](i)T }\\w_4^{[1](i)T}\\ \end{pmatrix} w [ 1 ] ( i ) = ⎝ ⎜ ⎜ ⎜ ⎛ w 1 1 [ 1 ] ( i ) w 2 1 [ 1 ] ( i ) w 3 1 [ 1 ] ( i ) w 4 1 [ 1 ] ( i ) w 1 2 [ 1 ] ( i ) w 2 2 [ 1 ] ( i ) w 3 2 [ 1 ] ( i ) w 4 2 [ 1 ] ( i ) ⎠ ⎟ ⎟ ⎟ ⎞ = ⎝ ⎜ ⎜ ⎜ ⎛ w 1 [ 1 ] ( i ) T w 2 [ 1 ] ( i ) T w 3 [ 1 ] ( i ) T w 4 [ 1 ] ( i ) T ⎠ ⎟ ⎟ ⎟ ⎞
z [ 1 ] ( i ) = ( z 1 [ 1 ] ( i ) z 2 [ 1 ] ( i ) z 3 [ 1 ] ( i ) z 4 [ 1 ] ( i ) ) = w [ 1 ] ( i ) x [ 0 ] ( i ) + b [ 1 ] ( i ) z^{[1](i)}=\begin{pmatrix} z_1^{[1](i)}\\ z_2^{[1](i)}\\ z_3^{[1](i)}\\ z_4^{[1](i)}\\ \end{pmatrix}=w^{[1](i)}x^{[0](i)}+b^{[1](i)} z [ 1 ] ( i ) = ⎝ ⎜ ⎜ ⎜ ⎛ z 1 [ 1 ] ( i ) z 2 [ 1 ] ( i ) z 3 [ 1 ] ( i ) z 4 [ 1 ] ( i ) ⎠ ⎟ ⎟ ⎟ ⎞ = w [ 1 ] ( i ) x [ 0 ] ( i ) + b [ 1 ] ( i )
a [ 1 ] [ i ] = g [ 1 ] ( z [ 1 ] ( i ) ) = t a n h ( z [ 1 ] ( i ) ) a^{[1][i]}=g^{[1]}(z^{[1](i)})=tanh(z^{[1](i)}) a [ 1 ] [ i ] = g [ 1 ] ( z [ 1 ] ( i ) ) = t a n h ( z [ 1 ] ( i ) )
其中,z [ 1 ] ( i ) z^{[1](i)} z [ 1 ] ( i ) 維度爲=4x1,a [ 1 ] [ i ] a^{[1][i]} a [ 1 ] [ i ] 維度爲=4x1。
z [ 2 ] ( i ) = w [ 2 ] ( i ) a [ 1 ] ( i ) + b [ 2 ] ( i ) z^{[2](i)}=w^{[2](i)}a^{[1](i)}+b^{[2](i)} z [ 2 ] ( i ) = w [ 2 ] ( i ) a [ 1 ] ( i ) + b [ 2 ] ( i )
y ^ = a [ 2 ] [ i ] = g [ 2 ] ( z [ 2 ] ( i ) ) = s i g m o i d ( z [ 2 ] ( i ) ) \hat y =a^{[2][i]}=g^{[2]}(z^{[2](i)})=sigmoid(z^{[2](i)}) y ^ = a [ 2 ] [ i ] = g [ 2 ] ( z [ 2 ] ( i ) ) = s i g m o i d ( z [ 2 ] ( i ) )
其中,z [ 2 ] ( i ) z^{[2](i)} z [ 2 ] ( i ) 維度爲=1x1,a [ 2 ] [ i ] a^{[2][i]} a [ 2 ] [ i ] 維度爲=1x1。
計算損失
J = − 1 m ∑ i = 0 m ( y l o g 10 a [ 2 ] [ i ] + ( 1 − y ) l o g 10 ( 1 − a [ 2 ] [ i ] ) ) J=-\frac{1}{m} \sum_{i=0}^m (ylog_{10}a^{[2][i]}+(1-y)log_{10}(1-a^{[2][i]})) J = − m 1 i = 0 ∑ m ( y l o g 1 0 a [ 2 ] [ i ] + ( 1 − y ) l o g 1 0 ( 1 − a [ 2 ] [ i ] ) )
向量化
維度
令A [ 0 ] = X [ 0 ] A^{[0]}=X^{[0]} A [ 0 ] = X [ 0 ] ;輸入X [ 0 ] X^{[0]} X [ 0 ] 維度爲n_x x m,其中有n_x個特徵,m個樣本;W [ 1 ] W^{[1]} W [ 1 ] 維度=n_h x n_x=4 x n_x;b [ 1 ] b^{[1]} b [ 1 ] 維度=n_h x 1=4x1;W [ 2 ] W^{[2]} W [ 2 ] 維度=n_y x n_h=1x4;b [ 2 ] b^{[2]} b [ 2 ] 維度=n_y x 1=1x1。
求Z [ 1 ] 、 A [ 1 ] 、 Z [ 2 ] 、 A [ 2 ] Z^{[1]}、A^{[1]}、Z^{[2]}、A^{[2]} Z [ 1 ] 、 A [ 1 ] 、 Z [ 2 ] 、 A [ 2 ]
Z [ 1 ] = W [ 1 ] X [ 0 ] + b [ 1 ] = W [ 1 ] A [ 0 ] + b [ 1 ] Z^{[1]}=W^{[1]}X^{[0]}+b^{[1]}=W^{[1]}A^{[0]}+b^{[1]} Z [ 1 ] = W [ 1 ] X [ 0 ] + b [ 1 ] = W [ 1 ] A [ 0 ] + b [ 1 ]
A [ 1 ] = g [ 1 ] ( Z [ 1 ] ) A^{[1]}=g^{[1]}(Z^{[1]}) A [ 1 ] = g [ 1 ] ( Z [ 1 ] )
其中,Z [ 1 ] Z^{[1]} Z [ 1 ] 的維度爲=4xm,A [ 1 ] A^{[1]} A [ 1 ] 的維度爲4xm。
Z [ 2 ] = W [ 2 ] A [ 1 ] + b [ 2 ] Z^{[2]}=W^{[2]}A^{[1]}+b^{[2]} Z [ 2 ] = W [ 2 ] A [ 1 ] + b [ 2 ]
A [ 2 ] = g [ 2 ] ( Z [ 2 ] ) A^{[2]}=g^{[2]}(Z^{[2]}) A [ 2 ] = g [ 2 ] ( Z [ 2 ] )
其中,Z [ 2 ] Z^{[2]} Z [ 2 ] 的維度爲=1xm,A [ 2 ] A^{[2]} A [ 2 ] 的維度爲1xm。
反向傳播推導
採用梯度下降法來求,所得公式如下:
第i個樣本
維度
由前向傳播得維度,x [ 0 ] ( i ) x^{[0](i)} x [ 0 ] ( i ) 維度:2x1;w [ 1 ] ( i ) w^{[1](i)} w [ 1 ] ( i ) 維度=n_h x n_x=4x2;b [ 1 ] ( i ) b^{[1](i)} b [ 1 ] ( i ) 維度=n_h x 1=4x1;w [ 2 ] ( i ) w^{[2](i)} w [ 2 ] ( i ) 維度=n_y x n_h=1x4;b [ 2 ] ( i ) b^{[2](i)} b [ 2 ] ( i ) 維度=n_y x 1=1x1。z [ 1 ] ( i ) z^{[1](i)} z [ 1 ] ( i ) 維度爲=4x1,a [ 1 ] [ i ] a^{[1][i]} a [ 1 ] [ i ] 維度爲=4x1。z [ 2 ] ( i ) z^{[2](i)} z [ 2 ] ( i ) 維度爲=1x1,a [ 2 ] [ i ] a^{[2][i]} a [ 2 ] [ i ] 維度爲=1x1。
求d z [ 1 ] ( i ) dz^{[1](i)} d z [ 1 ] ( i ) 、d w [ 1 ] ( i ) dw^{[1](i)} d w [ 1 ] ( i ) 、d b [ 1 ] ( i ) db^{[1](i)} d b [ 1 ] ( i ) 、d z [ 2 ] ( i ) dz^{[2](i)} d z [ 2 ] ( i ) 、d w [ 2 ] ( i ) dw^{[2](i)} d w [ 2 ] ( i ) 、d b [ 2 ] ( i ) db^{[2](i)} d b [ 2 ] ( i )
d z [ 2 ] ( i ) = ∂ L ( a [ 2 ] ( i ) , y ) ∂ a [ 2 ] ( i ) ∂ a [ 2 ] ( i ) ∂ z [ 2 ] ( i ) = a [ 2 ] ( i ) − y ( i ) dz^{[2](i)}=\frac{\partial L(a^{[2](i)},y)}{\partial a^{[2](i)}}\frac{\partial a^{[2](i)}}{\partial z^{[2](i)}}=a^{[2](i)} - y^{(i)} d z [ 2 ] ( i ) = ∂ a [ 2 ] ( i ) ∂ L ( a [ 2 ] ( i ) , y ) ∂ z [ 2 ] ( i ) ∂ a [ 2 ] ( i ) = a [ 2 ] ( i ) − y ( i )
上式推導過程見筆記吳恩達深度學習第一課–第二週神經網絡基礎作業上正反向傳播推導
d w [ 2 ] ( i ) = d z [ 2 ] ( i ) ∂ z [ 2 ] ( i ) ∂ w [ 2 ] ( i ) = ( a [ 2 ] ( i ) − y ( i ) ) a [ 1 ] [ i ] dw^{[2](i)}=dz^{[2](i)} \frac{\partial z^{[2](i)}}{\partial w^{[2](i)}} =(a^{[2](i)} - y^{(i)}) a^{[1][i]} d w [ 2 ] ( i ) = d z [ 2 ] ( i ) ∂ w [ 2 ] ( i ) ∂ z [ 2 ] ( i ) = ( a [ 2 ] ( i ) − y ( i ) ) a [ 1 ] [ i ]
由w [ 2 ] ( i ) w^{[2](i)} w [ 2 ] ( i ) 維度爲1x4,( a [ 2 ] ( i ) − y ( i ) ) (a^{[2](i)} - y^{(i)}) ( a [ 2 ] ( i ) − y ( i ) ) 維度爲1x1,a [ 1 ] [ i ] a^{[1][i]} a [ 1 ] [ i ] 維度爲4x1,所以得:
w [ 2 ] ( i ) = d z [ 2 ] ( i ) ∂ z [ 2 ] ( i ) ∂ w [ 2 ] ( i ) = d z [ 2 ] ( i ) a [ 1 ] [ i ] T = ( a [ 2 ] ( i ) − y ( i ) ) a [ 1 ] [ i ] T w^{[2](i)}=dz^{[2](i)} \frac{\partial z^{[2](i)}}{\partial w^{[2](i)}} =dz^{[2](i)}a^{[1][i]T}= (a^{[2](i)} - y^{(i)})a^{[1][i]T} w [ 2 ] ( i ) = d z [ 2 ] ( i ) ∂ w [ 2 ] ( i ) ∂ z [ 2 ] ( i ) = d z [ 2 ] ( i ) a [ 1 ] [ i ] T = ( a [ 2 ] ( i ) − y ( i ) ) a [ 1 ] [ i ] T
d b [ 2 ] ( i ) = d z [ 2 ] ( i ) = a [ 2 ] ( i ) − y ( i ) db^{[2](i)}=dz^{[2](i)}=a^{[2](i)} - y^{(i)} d b [ 2 ] ( i ) = d z [ 2 ] ( i ) = a [ 2 ] ( i ) − y ( i )
d z [ 1 ] ( i ) = ∂ L ( a [ 2 ] ( i ) , y ( i ) ) ∂ a [ 2 ] ( i ) ∂ a [ 2 ] ( i ) ∂ z [ 2 ] ( i ) ∂ z [ 2 ] ( i ) ∂ a [ 1 ] ( i ) ∂ a [ 1 ] ( i ) ∂ z [ 1 ] ( i ) = d z [ 2 ] ( i ) w [ 2 ] ( i ) ∗ g [ 1 ] ′ ( z [ 1 ] ( i ) ) = w [ 2 ] ( i ) T d z [ 2 ] ( i ) ∗ g [ 1 ] ′ ( z [ 1 ] ( i ) ) dz^{[1](i)}=\frac{\partial L(a^{[2](i)},y^{(i)})}{\partial a^{[2](i)}} \frac{\partial a^{[2](i)}}{\partial z^{[2](i)}} \frac{\partial z^{[2](i)}}{\partial a^{[1](i)}} \frac{\partial a^{[1](i)}}{\partial z^{[1](i)}}=dz^{[2](i)} w^{[2](i)} *g^{[1]'}(z^{[1](i)})=w^{[2](i)T} dz^{[2](i)} *g^{[1]'}(z^{[1](i)}) d z [ 1 ] ( i ) = ∂ a [ 2 ] ( i ) ∂ L ( a [ 2 ] ( i ) , y ( i ) ) ∂ z [ 2 ] ( i ) ∂ a [ 2 ] ( i ) ∂ a [ 1 ] ( i ) ∂ z [ 2 ] ( i ) ∂ z [ 1 ] ( i ) ∂ a [ 1 ] ( i ) = d z [ 2 ] ( i ) w [ 2 ] ( i ) ∗ g [ 1 ] ′ ( z [ 1 ] ( i ) ) = w [ 2 ] ( i ) T d z [ 2 ] ( i ) ∗ g [ 1 ] ′ ( z [ 1 ] ( i ) )
d w [ 1 ] ( i ) = d z [ 1 ] ( i ) ∂ z [ 1 ] ∂ w [ 1 ] = d z [ 1 ] ( i ) x [ 0 ] [ i ] T dw^{[1](i)}=dz^{[1](i)} \frac{\partial z^{[1]}}{\partial w^{[1]}} = dz^{[1](i)} x^{[0][i]T} d w [ 1 ] ( i ) = d z [ 1 ] ( i ) ∂ w [ 1 ] ∂ z [ 1 ] = d z [ 1 ] ( i ) x [ 0 ] [ i ] T
d b [ 1 ] ( i ) = d z [ 1 ] ( i ) db^{[1](i)}=dz^{[1](i)} d b [ 1 ] ( i ) = d z [ 1 ] ( i )
向量化
維度
令A [ 0 ] = X [ 0 ] A^{[0]}=X^{[0]} A [ 0 ] = X [ 0 ] ;輸入X [ 0 ] X^{[0]} X [ 0 ] 維度爲n_x x m,其中有n_x個特徵,m個樣本;W [ 1 ] W^{[1]} W [ 1 ] 維度=n_h x n_x=4 x n_x;b [ 1 ] b^{[1]} b [ 1 ] 維度=n_h x 1=4x1;W [ 2 ] W^{[2]} W [ 2 ] 維度=n_y x n_h=1x4;b [ 2 ] b^{[2]} b [ 2 ] 維度=n_y x 1=1x1。Z [ 1 ] Z^{[1]} Z [ 1 ] 的維度爲=4xm,A [ 1 ] A^{[1]} A [ 1 ] 的維度爲4xm。Z [ 2 ] Z^{[2]} Z [ 2 ] 的維度爲=1xm,A [ 2 ] A^{[2]} A [ 2 ] 的維度爲1xm。
求d Z [ 1 ] dZ^{[1]} d Z [ 1 ] 、d W [ 1 ] dW^{[1]} d W [ 1 ] 、d b [ 1 ] db^{[1]} d b [ 1 ] 、d Z [ 2 ] dZ^{[2]} d Z [ 2 ] 、d W [ 2 ] dW^{[2]} d W [ 2 ] 、d b [ 2 ] db^{[2]} d b [ 2 ]
推導如下:
d Z [ 2 ] = A [ 2 ] − Y dZ^{[2]}=A^{[2]} - Y d Z [ 2 ] = A [ 2 ] − Y
d W [ 2 ] = 1 m d Z [ 2 ] A [ 1 ] dW^{[2]}=\frac{1}{m} dZ^{[2]} A^{[1]} d W [ 2 ] = m 1 d Z [ 2 ] A [ 1 ]
由於Z [ 2 ] Z^{[2]} Z [ 2 ] 的維度爲=1xm,A [ 1 ] A^{[1]} A [ 1 ] 的維度爲4xm,W [ 2 ] W^{[2]} W [ 2 ] 維度=1x4,所以需要將A [ 1 ] A^{[1]} A [ 1 ] 轉置,得到下式:
d W [ 2 ] = 1 m d Z [ 2 ] A [ 1 ] T = 1 m ( A [ 2 ] − Y ) A [ 1 ] T dW^{[2]}=\frac{1}{m} dZ^{[2]} A^{[1]T}=\frac{1}{m} (A^{[2]} - Y)A^{[1]T} d W [ 2 ] = m 1 d Z [ 2 ] A [ 1 ] T = m 1 ( A [ 2 ] − Y ) A [ 1 ] T
d b [ 2 ] = 1 m d Z [ 2 ] = 1 m n p . s u m ( A [ 2 ] − Y ) db^{[2]}=\frac{1}{m}dZ^{[2]}=\frac{1}{m} np.sum(A^{[2]} - Y) d b [ 2 ] = m 1 d Z [ 2 ] = m 1 n p . s u m ( A [ 2 ] − Y )
由於A [ 2 ] − Y A^{[2]} - Y A [ 2 ] − Y 維度爲1xm,而d b [ 2 ] db^{[2]} d b [ 2 ] 維度爲1x1,所以對A [ 2 ] − Y A^{[2]} - Y A [ 2 ] − Y 求和。
d Z [ 1 ] = d Z [ 2 ] W [ 2 ] ∗ g [ 1 ] ′ ( Z [ 1 ] ) dZ^{[1]}=dZ^{[2]}W^{[2]}* g^{[1]'}(Z^{[1]}) d Z [ 1 ] = d Z [ 2 ] W [ 2 ] ∗ g [ 1 ] ′ ( Z [ 1 ] )
由於W [ 2 ] W^{[2]} W [ 2 ] 維度=1x4,Z [ 2 ] Z^{[2]} Z [ 2 ] 的維度爲=1xm,d Z [ 1 ] dZ^{[1]} d Z [ 1 ] 的維度爲4xm,所以需要將W [ 2 ] W^{[2]} W [ 2 ] 轉置,得到下式:
d Z [ 1 ] = W [ 2 ] T d Z [ 2 ] ∗ g [ 1 ] ′ ( Z [ 1 ] ) = n p . d o t ( W [ 2 ] . T , d Z [ 2 ] ) ∗ g [ 1 ] ′ ( Z [ 1 ] ) dZ^{[1]}=W^{[2]T}dZ^{[2]}* g^{[1]'}(Z^{[1]})=np.dot(W^{[2]}.T,dZ^{[2]})*g^{[1]'}(Z^{[1]}) d Z [ 1 ] = W [ 2 ] T d Z [ 2 ] ∗ g [ 1 ] ′ ( Z [ 1 ] ) = n p . d o t ( W [ 2 ] . T , d Z [ 2 ] ) ∗ g [ 1 ] ′ ( Z [ 1 ] )
d W [ 1 ] = 1 m d Z [ 1 ] X dW^{[1]}=\frac{1}{m} dZ^{[1]} X d W [ 1 ] = m 1 d Z [ 1 ] X
由於d W [ 1 ] dW^{[1]} d W [ 1 ] 維度爲:4 x n_x,d Z [ 1 ] dZ^{[1]} d Z [ 1 ] 的維度爲4xm,X維度爲n_x x m,所以將X轉置,得到下式:
d W [ 1 ] = 1 m d Z [ 1 ] X T dW^{[1]}=\frac{1}{m} dZ^{[1]} X^{T} d W [ 1 ] = m 1 d Z [ 1 ] X T
d b [ 1 ] = 1 m d Z [ 1 ] db^{[1]}=\frac{1}{m} dZ^{[1]} d b [ 1 ] = m 1 d Z [ 1 ]
由於d Z [ 1 ] dZ^{[1]} d Z [ 1 ] 的維度爲4xm,而b [ 1 ] b^{[1]} b [ 1 ] 維度=4x1,所以對每一行求和,得下式:
d b [ 1 ] = 1 m n p . s u m ( d Z [ 1 ] ) db^{[1]}=\frac{1}{m} np.sum(dZ^{[1]}) d b [ 1 ] = m 1 n p . s u m ( d Z [ 1 ] )