驗證梯度的正確性

原創

大眼呆萌君

2020-07-04 18:42

題目（152）：如何驗證求目標函數梯度功能的正確性？

考點：微積分、Taylor expansion

近似（微積分）
根據partial derivative的定義，
$\frac{\partial L(\bm \theta)}{\partial \theta_i} = \frac{L(\theta_1, \cdots,\theta_i+h, \cdots,\theta_p) - L(\theta_1, \cdots,\theta_i-h, \cdots,\theta_p)}{2h}$

*E.g. $h=10^{-7}$

近似誤差（Taylor expansion with Lagrange remainder）
Univariate Taylor expansion on the function $\tilde{L}(x) = L(\bm \theta + x \bm e_i)$ :

$L(\bm \theta+h\bm e_i) = L(\bm \theta) + (h-0)L(\bm \theta) + \frac{h^2}{2}L''(\bm \theta) + \frac{h^3}{6}L'''(\bm \theta + p \bm e_i)$

$L(\bm \theta-h\bm e_i) = L(\bm \theta) - (h-0)L(\bm \theta) + \frac{h^2}{2}L''(\bm \theta) - \frac{h^3}{6}L'''(\bm \theta + q \bm e_i)$

$\frac{L(\bm \theta+h\bm e_i) - L(\bm \theta-h\bm e_i)}{2h} = L(\bm \theta) + \frac{h^2}{12}[L'''(\bm \theta + p \bm e_i)-L'''(\bm \theta + q \bm e_i)] \hspace{3.8em}$

$|L(\bm \theta) - \frac{L(\bm \theta+h\bm e_i) - L(\bm \theta-h\bm e_i)}{2h}| = \frac{L'''(\bm \theta + p \bm e_i)-L'''(\bm \theta + q \bm e_i)}{12}h^2 = Mh^2,$
where $p,q \in (0,h)$ . The last equation suggests that the approximation error is proportional to $h^2$ .

Reasons and diagnosis when the error is larger than expected:

Large value of $M$ : reduce $h$ by an order of $10^{-1}$ and check if the error is reduced by an order of $10^{-2}$ .
Wrong calculation of gradient

Appendix
Lagrangian remainder:
$f(x) = f(x_0) + (x-x_0) f'(x_0) + \cdots + \frac{(x-x_0)^n}{n!}f^{(n)}(x_0) + R_n$

$R_n = \int_{x_0}^x f^{(n+1)}(t) \frac{(x-t)^n}{n!}dt$
Using the mean-value theorem,
$R_n = \frac{f^{(n+1)}(x^\ast)}{(n+1)!}(x-x_0)^{n+1},$
for some $x^\ast \in (x_0,x)$ .

參考文獻：

《百面機器學習》
Lagrange Remainder, http://mathworld.wolfram.com/LagrangeRemainder.html

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

相關文章

梯度下降、隨機梯度下降法、及其改進

題目（155）：當訓練數據量特別大時，經典的梯度下降法存在什麼問題，需要做如何改進？題目（158）：隨機梯度下降法失效的原因。題目（160）：爲了改進隨機梯度下降法，研究者都做了哪些改動？提出了哪些變種方法？它們各有哪些特點？

大眼呆萌君

2020-07-04 19:24:32

機器學習中的凸和非凸優化問題

題目（145）：機器學習中的優化問題，哪些是凸優化問題，哪些是非凸優化問題？請各舉一個例子。凸優化定義凸優化問題非凸優化問題凸優化定義：公式、geometric insight 凸優化問題：邏輯迴歸

大眼呆萌君

2020-07-04 18:42:44

L1正則項與稀疏性

題目（164）：L1正則化使得模型參數具有稀疏性的原理是什麼？回答角度：幾何角度，即解空間形狀微積分角度，對帶L1限制的目標函數求導貝葉斯先驗解空間形狀 Step 1. 正則條件和限制條件的等價性 Step 2. L

大眼呆萌君

2020-07-04 18:42:44

無約束優化問題的求解

題目（148）：無約束優化問題的優化方法有哪些？複習點：一階、二階算法和Taylor expansion之間的關係直接求解迭代求解一階算法二階算法直接求解 convex objective function

大眼呆萌君

2020-07-04 18:42:44

凸優化學習路線整理

凸優化有一天，機器學習、數學規劃、運籌學、最優化…各種門類聚在一起開會，幾經討論，大家一致統一思想，先解決掉“凸優化”，然後大家的日子就都好過了。參考資料知乎問題 : 如何從零開始學習凸優化？我的凸優化學習之路

2020-07-06 23:36:43

GPU Profiling

提起優化，第一件事情要做的就是Profiling，因爲沒有經過Profiling誰都不知道瓶頸是什麼，正確的Profiling是對遊戲整體性能的全面認識。關於CPU的Profiling已經非常成熟，各種軟件都很齊全，甚至沒

关中出刀客

2020-07-06 22:03:32

CVX based SLAM algorithms paper read

CVX based SLAM algorithms paper read1 Banch and Bound1.1 Practical Global Optimization for Multiview Geometry1.2 Br

2020-07-05 14:33:51

PCG(preconditioned conjugate gradient) for RCS(reduced camera system) in SLAM

Preconditioned Conjugate Gradient for Reduced Camera System in SLAM1. Introduction1.1 Linear algebra softwares1.2 P

2020-07-05 14:33:51

Paper read with more formula derivation: Semidefinite Programmin

Paper read with more formula derivation: Semidefinite Programming1. Introduction1.1 SDP2. Duality2.1 Derivation2.2

2020-07-05 14:33:51

Quasi-Newton擬牛頓法（共軛方向法）

Quasi-Newton擬牛頓法（共軛方向法）1. Introduction2. 牛頓法2.1 不能保證收斂2.2 Hessian計算複雜3. 共軛方向法3.1 共軛方向3.2 共軛方向上可以收斂到極小3.3 共軛梯度法得到的是Q

2020-07-05 14:33:51

Paper read :on the unificationof line process,outlier rejection and robust statistics

Paper read :on the unificationof line process,outlier rejection and robust statistics1. Total variance reconstructi

2020-07-05 14:33:51

Line Search Methods

重點 Armijo condition的直觀理解背景: In gradient descent algorithms, step size may be too large or too small, as shown in

大眼呆萌君

2020-07-04 18:42:44

百面機器學習 #3 經典算法：02 邏輯迴歸

參考：百面機器學習西瓜書 02 邏輯迴歸Logistic Regression（對數機率迴歸） 2.1 邏輯迴歸和線性迴歸二者都使用極大似然法來對訓練樣本進行建模。在求解超參數的過程中，都可以使用梯度下降的方法

2020-07-06 05:06:31

百面機器學習 #3 經典算法：01-3 核函數支撐向量機SVM

文章目錄1.3 非線性SVM與核技巧1.3.1 核函數1.3.2 核技巧在支持向量機中的應用1.3.3 常用核函數1.4 其他問題1.4.1 是否存在一組參數使SVM訓練誤差爲0：是1.4.2 訓練誤差爲0的SVM分類器一定存在嗎

2020-06-25 16:33:27

百面機器學習 #2 模型評估：01&02 精確率與召回率，假陽性與真陽性率，PR曲線和ROC曲線

文章目錄1. P-R（Precision-Recall）曲線F1 score2. 平方根誤差的侷限性3. ROC曲線ROC曲線繪製4. AUC predict Positive predict Negative

2020-06-25 16:33:24

24小時熱門文章

最新文章

最新評論文章