題目(152):如何驗證求目標函數梯度功能的正確性?
考點:微積分、Taylor expansion
近似 (微積分)
根據partial derivative的定義,
∂ L ( θ ) ∂ θ i = L ( θ 1 , ⋯ , θ i + h , ⋯ , θ p ) − L ( θ 1 , ⋯ , θ i − h , ⋯ , θ p ) 2 h \frac{\partial L(\bm \theta)}{\partial \theta_i} = \frac{L(\theta_1, \cdots,\theta_i+h, \cdots,\theta_p) - L(\theta_1, \cdots,\theta_i-h, \cdots,\theta_p)}{2h} ∂ θ i ∂ L ( θ ) = 2 h L ( θ 1 , ⋯ , θ i + h , ⋯ , θ p ) − L ( θ 1 , ⋯ , θ i − h , ⋯ , θ p )
*E.g. h = 1 0 − 7 h=10^{-7} h = 1 0 − 7
近似誤差 (Taylor expansion with Lagrange remainder)
Univariate Taylor expansion on the function L ~ ( x ) = L ( θ + x e i ) \tilde{L}(x) = L(\bm \theta + x \bm e_i) L ~ ( x ) = L ( θ + x e i ) :
L ( θ + h e i ) = L ( θ ) + ( h − 0 ) L ( θ ) + h 2 2 L ′ ′ ( θ ) + h 3 6 L ′ ′ ′ ( θ + p e i ) L(\bm \theta+h\bm e_i) = L(\bm \theta) + (h-0)L(\bm \theta) + \frac{h^2}{2}L''(\bm \theta) + \frac{h^3}{6}L'''(\bm \theta + p \bm e_i) L ( θ + h e i ) = L ( θ ) + ( h − 0 ) L ( θ ) + 2 h 2 L ′ ′ ( θ ) + 6 h 3 L ′ ′ ′ ( θ + p e i )
L ( θ − h e i ) = L ( θ ) − ( h − 0 ) L ( θ ) + h 2 2 L ′ ′ ( θ ) − h 3 6 L ′ ′ ′ ( θ + q e i ) L(\bm \theta-h\bm e_i) = L(\bm \theta) - (h-0)L(\bm \theta) + \frac{h^2}{2}L''(\bm \theta) - \frac{h^3}{6}L'''(\bm \theta + q \bm e_i) L ( θ − h e i ) = L ( θ ) − ( h − 0 ) L ( θ ) + 2 h 2 L ′ ′ ( θ ) − 6 h 3 L ′ ′ ′ ( θ + q e i )
L ( θ + h e i ) − L ( θ − h e i ) 2 h = L ( θ ) + h 2 12 [ L ′ ′ ′ ( θ + p e i ) − L ′ ′ ′ ( θ + q e i ) ] \frac{L(\bm \theta+h\bm e_i) - L(\bm \theta-h\bm e_i)}{2h} = L(\bm \theta) + \frac{h^2}{12}[L'''(\bm \theta + p \bm e_i)-L'''(\bm \theta + q \bm e_i)] \hspace{3.8em} 2 h L ( θ + h e i ) − L ( θ − h e i ) = L ( θ ) + 1 2 h 2 [ L ′ ′ ′ ( θ + p e i ) − L ′ ′ ′ ( θ + q e i ) ]
∣ L ( θ ) − L ( θ + h e i ) − L ( θ − h e i ) 2 h ∣ = L ′ ′ ′ ( θ + p e i ) − L ′ ′ ′ ( θ + q e i ) 12 h 2 = M h 2 , |L(\bm \theta) - \frac{L(\bm \theta+h\bm e_i) - L(\bm \theta-h\bm e_i)}{2h}| = \frac{L'''(\bm \theta + p \bm e_i)-L'''(\bm \theta + q \bm e_i)}{12}h^2 = Mh^2, ∣ L ( θ ) − 2 h L ( θ + h e i ) − L ( θ − h e i ) ∣ = 1 2 L ′ ′ ′ ( θ + p e i ) − L ′ ′ ′ ( θ + q e i ) h 2 = M h 2 ,
where p , q ∈ ( 0 , h ) p,q \in (0,h) p , q ∈ ( 0 , h ) . The last equation suggests that the approximation error is proportional to h 2 h^2 h 2 .
Reasons and diagnosis when the error is larger than expected:
Large value of M M M : reduce h h h by an order of 1 0 − 1 10^{-1} 1 0 − 1 and check if the error is reduced by an order of 1 0 − 2 10^{-2} 1 0 − 2 .
Wrong calculation of gradient
Appendix
Lagrangian remainder:
f ( x ) = f ( x 0 ) + ( x − x 0 ) f ′ ( x 0 ) + ⋯ + ( x − x 0 ) n n ! f ( n ) ( x 0 ) + R n f(x) = f(x_0) + (x-x_0) f'(x_0) + \cdots + \frac{(x-x_0)^n}{n!}f^{(n)}(x_0) + R_n f ( x ) = f ( x 0 ) + ( x − x 0 ) f ′ ( x 0 ) + ⋯ + n ! ( x − x 0 ) n f ( n ) ( x 0 ) + R n
R n = ∫ x 0 x f ( n + 1 ) ( t ) ( x − t ) n n ! d t R_n = \int_{x_0}^x f^{(n+1)}(t) \frac{(x-t)^n}{n!}dt R n = ∫ x 0 x f ( n + 1 ) ( t ) n ! ( x − t ) n d t
Using the mean-value theorem,
R n = f ( n + 1 ) ( x ∗ ) ( n + 1 ) ! ( x − x 0 ) n + 1 , R_n = \frac{f^{(n+1)}(x^\ast)}{(n+1)!}(x-x_0)^{n+1}, R n = ( n + 1 ) ! f ( n + 1 ) ( x ∗ ) ( x − x 0 ) n + 1 ,
for some x ∗ ∈ ( x 0 , x ) x^\ast \in (x_0,x) x ∗ ∈ ( x 0 , x ) .
參考文獻:
《百面機器學習》
Lagrange Remainder, http://mathworld.wolfram.com/LagrangeRemainder.html