邏輯迴歸

邏輯迴歸的梯度下降法推導
邏輯迴歸目標函數爲凸函數

訓練數據 $D = \{ (\mathbf{x}_{1}, y_{1}), \cdots, (\mathbf{x}_{n}, y_{n}) \}$ ，其中 $(\mathbf{x}_{i}, y_{i})$ 表示一條樣本， $\mathbf{x}_{i} \in \R^{D}$ 爲 $D$ 維樣本特徵（feature）， $y_{i} \in \{ 0, 1\}$ 表示樣本標籤（label）。

邏輯迴歸模型的參數爲 $(\mathbf{w}, b)$ 。爲推導方便，通常將 $b$ 整合到 $\mathbf{w}$ 中，此時， $\mathbf{w}$ 和 $\mathbf{x}_{i}$ 分別改寫爲

$\mathbf{w} = [w_{0}, w_{1}, \cdots, w_{D}], \ \mathbf{x}_{i} = [1, x_{1}, \cdots, x_{D}]$

1 邏輯迴歸的目標函數

目標函數（objective function），也稱爲損失函數（loss function），記爲 $\mathcal{L} (\mathbf{w})$ 。

二分類問題模型

$p(y | \mathbf{x}; \mathbf{w} ) = p(y = 1 | \mathbf{x}; \mathbf{w})^{y} [1 - p(y = 1 | \mathbf{x}; \mathbf{w})]^{1 - y} \tag {1}$

最大似然估計（MLE）

$\begin{aligned} \mathbf{w}^{\ast} & = \arg \max_{\mathbf{w}} p(\mathbf{y} | \mathbf{x}; \mathbf{w} ) \\ & = \arg \max_{\mathbf{w}} \prod_{i = 1}^{n} p(y_{i} | \mathbf{x}_{i}; \mathbf{w} ) \\ & = \arg \max_{\mathbf{w}} \log \left[ \prod_{i = 1}^{n} p(y_{i} | \mathbf{x}_{i}; \mathbf{w} ) \right] \\ & = \arg \max_{\mathbf{w}} \sum_{i = 1}^{n} \log \left[ p(y_{i} | \mathbf{x}_{i}; \mathbf{w} ) \right] \\ & = \arg \max_{\mathbf{w}} \sum_{i = 1}^{n} \log \left[ p(y_{i} = 1 | \mathbf{x}_{i}; \mathbf{w})^{y_{i}} [1 - p(y_{i} = 1 | \mathbf{x}_{i}; \mathbf{w})]^{1 - y_{i}} \right] \\ & = \arg \max_{\mathbf{w}} \sum_{i = 1}^{n} \left[ y_{i} \log p(y_{i} = 1 | \mathbf{x}_{i}; \mathbf{w}) + (1 - y_{i}) \log [1 - p(y_{i} = 1 | \mathbf{x}_{i}; \mathbf{w})] \right] \\ \end{aligned} \tag {2}$

方程（2）是對 $p(\mathbf{y} | \mathbf{x}; \mathbf{w} )$ 的最大似然估計。通常，目標函數對 $\mathbf{w}$ 取極小：

$\mathbf{w}^{\ast} = \arg \min_{\mathbf{w}} \mathcal{L} (\mathbf{w})$

則目標函數（交叉熵損失）表示爲：

$\mathcal{L} (\mathbf{w}) = - \sum_{i = 1}^{n} \left[ y_{i} \log p(y_{i} = 1 | \mathbf{x}_{i}; \mathbf{w}) + (1 - y_{i}) \log [1 - p(y_{i} = 1 | \mathbf{x}_{i}; \mathbf{w})] \right] \tag {3}$

邏輯函數（logistic sigmoid function）

$\sigma (x) = \frac{1}{1 + e^{- x}}, \ \sigma^{\prime} (x) = \sigma (x) \left(1 - \sigma (x) \right)$

考慮二分類問題：給定 $\mathbf{x}$ ，事件發生（ $y = 1$ ）的條件概率爲 $p(y = 1 | x; \mathbf{w})$ ，則該事件發生的條件機率比（odds）爲：

$\text{odd} = \frac{p(y = 1 | x; \mathbf{w})}{1 - p(y = 1 | x; \mathbf{w})} = \exp(\mathbf{w}^{\text{T}} \mathbf{x})$

可知：

$p(y = 1 | x; \mathbf{w}) = \sigma (\mathbf{w}^{\text{T}} \mathbf{x}) = \frac{1}{1 + e^{- \mathbf{w}^{\text{T}} \mathbf{x}}} \tag {4}$

將方程（4）代入方程（3）中，可得邏輯迴歸的目標函數：

$\mathcal{L} (\mathbf{w}) = - \sum_{i = 1}^{n} \left[ y_{i} \log \sigma (\mathbf{w}^{\text{T}} \mathbf{x}_{i}) + (1 - y_{i}) \log [1 - \sigma (\mathbf{w}^{\text{T}} \mathbf{x}_{i})] \right] \tag {5}$

2 $\mathcal{L} (\mathbf{w})$ 對 $\mathbf{w}$ 的梯度

向量求導

$\frac{\partial \mathbf{a}^{\text{T}} \mathbf{x}}{\partial \mathbf{x}} = \frac{\partial \mathbf{x}^{\text{T}} \mathbf{a}}{\partial \mathbf{x}} = \mathbf{a}$

$\mathcal{L} (\mathbf{w})$ 對 $\mathbf{w}$ 的梯度

$\begin{aligned} \frac{\partial \mathcal{L} (\mathbf{w})}{\partial \mathbf{w}} & = - \frac{ \partial \sum_{i = 1}^{n} \left[ y_{i} \log \sigma (\mathbf{w}^{\text{T}} \mathbf{x}_{i}) + (1 - y_{i}) \log [1 - \sigma (\mathbf{w}^{\text{T}} \mathbf{x}_{i})] \right] } { \partial \mathbf{w} } \\ & = - \sum_{i = 1}^{n} \left[ y_{i} \frac{ \partial \log \sigma (\mathbf{w}^{\text{T}} \mathbf{x}_{i}) } { \partial \mathbf{w} } + (1 - y_{i}) \frac{ \partial \log [1 - \sigma (\mathbf{w}^{\text{T}} \mathbf{x}_{i})] } { \partial \mathbf{w} } \right] \\ & = - \sum_{i = 1}^{n} \left[ y_{i} \left( 1 - \sigma (\mathbf{w}^{\text{T}} \mathbf{x}_{i} ) \right) \mathbf{x}_{i} - (1 - y_{i}) \sigma (\mathbf{w}^{\text{T}} \mathbf{x}_{i} ) \mathbf{x}_{i} \right] \\ & = \sum_{i = 1}^{n} \left[ \sigma (\mathbf{w}^{\text{T}} \mathbf{x}_{i}) - y_{i} \right] \mathbf{x}_{i} \\ \end{aligned} \tag {6}$

梯度下降

$\mathbf{w} = \mathbf{w} - \eta \frac{\partial \mathcal{L} (\mathbf{w})}{\partial \mathbf{w}} = \mathbf{w} - \eta \sum_{i = 1}^{n} \left[ \sigma (\mathbf{w}^{\text{T}} \mathbf{x}_{i}) - y_{i} \right] \mathbf{x}_{i}$

3 $\mathcal{L} (\mathbf{w})$ 對 $\mathbf{w}$ 的二階導數

Hessian方程（Hessian formulation）

$\frac{\partial \mathbf{y}}{\partial \mathbf{y}} = \begin{bmatrix} \frac{\partial y_{1}}{\partial \mathbf{x}} & \cdots & \frac{\partial y_{n}}{\partial \mathbf{x}} \\ \end{bmatrix}$

$\mathcal{L} (\mathbf{w})$ 對 $\mathbf{w}$ 的二階導數

$\begin{aligned} \frac{\partial^{2} \mathcal{L}(\mathbf{w})}{\partial^{2} \mathbf{w}} & = \frac{ \partial \sum_{i = 1}^{n} \left[ \sigma (\mathbf{w}^{\text{T}} \mathbf{x}_{i}) - y_{i} \right] \mathbf{x}_{i} } { \partial \mathbf{w} } \\ & = \sum_{i = 1}^{n} \frac{ \partial \sigma (\mathbf{w}^{\text{T}} \mathbf{x}_{i}) } { \partial \mathbf{w} } \mathbf{x}_{i}^{\text{T}} \\ & = \sum_{i = 1}^{n} \sigma (\mathbf{w}^{\text{T}} \mathbf{x}_{i}) \left( 1 - \sigma (\mathbf{w}^{\text{T}} \mathbf{x}_{i}) \right) \mathbf{x}_{i} \mathbf{x}_{i}^{\text{T}} \\ \end{aligned} \tag {7}$

4 邏輯迴歸函數是凸函數

假設一個函數是凸函數，則其局部最優解即爲全局最優解。所以如果通過隨機梯度下降法等手段找到最優解時就可以確認這個解就是全局最優解。

證明凸函數的方法之一是證明二次導數大於等於0。例如函數 $f(x) = x^{2}- 3x + 3$ ，其二次導數 $f''(x) = 2 \gt 0$ ，因此 $f(x)$ 是凸函數。該理論也適用於多元變量函數。對於多元函數，只要證明其二階導數矩陣是半正定的（posititive semidefinite）即可。爲證明矩陣 $\mathbf{H}$ 爲半正定矩陣，需要證明對於任意非零向量 $\mathbf{v}$ ，滿足 $\mathbf{v}^{\text{T}} \mathbf{H} \mathbf{v} \geq 0$ 。

$\frac{\partial^{2} \mathcal{L}(\mathbf{w})}{\partial^{2} \mathbf{w}}$ 半正定性證明

$\begin{aligned} \mathbf{v}^{\text{T}} \frac{\partial^{2} \mathcal{L}(\mathbf{w})}{\partial^{2} \mathbf{w}} \mathbf{v} & = \sum_{i = 1}^{n} \sigma (\mathbf{w}^{\text{T}} \mathbf{x}_{i}) \left( 1 - \sigma (\mathbf{w}^{\text{T}} \mathbf{x}_{i}) \right) \mathbf{v}^{\text{T}} \mathbf{x}_{i} \mathbf{x}_{i}^{\text{T}} \mathbf{v}\\ \end{aligned} \tag {8}$

由於 $0 \lt \sigma (\mathbf{w}^{\text{T}} \mathbf{x}_{i}) \lt 1$ ，故只需證明 $\mathbf{v}^{\text{T}} \mathbf{x}_{i} \mathbf{x}_{i}^{\text{T}} \mathbf{v} \geq 0$

$\mathbf{v}^{\text{T}} \mathbf{x}_{i} \mathbf{x}_{i}^{\text{T}} \mathbf{v} = \left( \mathbf{v}^{\text{T}} \mathbf{x}_{i} \right)^{2} \geq 0 \Rightarrow \frac{\partial^{2} \mathcal{L}(\mathbf{w})}{\partial^{2} \mathbf{w}} \geq 0$

因此， $\frac{\partial^{2} \mathcal{L}(\mathbf{w})}{\partial^{2} \mathbf{w}}$ 爲半正定矩陣，邏輯迴歸函數是凸函數得證。

邏輯迴歸

邏輯迴歸

1 邏輯迴歸的目標函數

2 $\mathcal{L} (\mathbf{w})$ 對 $\mathbf{w}$ 的梯度

3 $\mathcal{L} (\mathbf{w})$ 對 $\mathbf{w}$ 的二階導數

4 邏輯迴歸函數是凸函數

vue項目獲取富文本編輯器wangEditor內容導出爲word（html轉word格式並下載）

dotnet C# 創建 X11 應用時設置窗口背景顏色

Navicat安裝與激活教程

TDengine docker安裝方法

vue3組件通信與props

sapui5

Alpine Linux apk add DNS lookup error

部分JDK版本的發佈時間

工作中用到的腳本合集

合併代碼時Beyond Compare設置

視覺SLAM十四講：第3講三維空間剛體運動

ubuntu系統ros安裝

視覺SLAM十四講：第2講初識SLAM

期望極大（EM）算法

Linux環境下，使用VSCode編譯C++工程

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

邏輯迴歸

邏輯迴歸

1 邏輯迴歸的目標函數

2 L(w)\mathcal{L} (\mathbf{w})L(w)對w\mathbf{w}w的梯度

3 L(w)\mathcal{L} (\mathbf{w})L(w)對w\mathbf{w}w的二階導數

4 邏輯迴歸函數是凸函數

2 $\mathcal{L} (\mathbf{w})$ 對 $\mathbf{w}$ 的梯度

3 $\mathcal{L} (\mathbf{w})$ 對 $\mathbf{w}$ 的二階導數