邏輯迴歸推導及python實例分析

原創

Diamond-Mine

2020-06-16 15:38

數學公式

（1）log函數計算
$log(M*N)=logM+logN$
$(logM^N)=NlogM$

邏輯迴歸

Logistic Regression是廣義線性模型的一種，可以用線性函數表示分類的超平面：
$Wx+b=y$
其中W爲權重，b爲偏置項。在多維情況下，W和b爲向量。

通過對訓練樣本的學習，得到超平面，再使用閾值函數，將樣本映射到不同的類別（0或1）。

常用的閾值函數有Sigmoid函數，形式爲：
$f(x)=\frac{1}{1+e^{-x}}$

可以看出，函數的值域爲(0,1)，在0附近的變化比較明顯。

Sigmoid的導數爲：
$\sigma'(x) = \left(\frac{1}{1+e^{-x}}\right)' = \frac{-(1+e^{-x})'}{(1+e^{-x})^2} = \frac{-1'-(e^{-x})'}{(1+e^{-x})^2}$
$= \frac{0-(-x)'(e^{-x})}{(1+e^{-x})^2} = \frac{e^{-x}}{(1+e^{-x})^2}$
$= \left(\frac{1}{1+e^{-x}}\right)\left(\frac{e^{-x}}{1+e^{-x}}\right)$
$= \sigma(x)\left(\frac{1 + e^{-x}}{1+e^{-x}} - \frac{1}{1+e^{-x}}\right)$
$= \sigma(x)(1 - \sigma(x))$

損失函數

對於輸入向量X，屬於正例的概率爲：
$P(y=1)=\sigma(wx+b)=\frac{1}{1+e^{-(wx+b)}}$
屬於負例的概率爲：
$P(y=0)=1-\sigma(wx+b)$
根據伯努利概率函數，屬於類別y的概率爲：
$P(y)=\sigma(wx+b)^y (1-\sigma(wx+b))^{1-y}, y=0,1$
已經每個訓練樣本的所屬類別的概率，將訓練樣本的類別概率連乘，用極大似然法估計。似然函數爲：
$L_\theta=\prod_{i=1}^mP_i(y)$
$=\prod_{i=1}^m[h_{\theta}(x^i)^{y^i}(1-h_{\theta}(x^i))^{1-y^i}]$
其中 $h_{\theta}(x^i)=\sigma(wx^i+b)$ 。

爲求似然函數的最大值，可使用log似然函數，將連乘轉換爲連加操作。將負的log似然函數（negative log likehood）NLL作爲損失函數，此時需要計算NLL的極小值，損失函數爲：
$-log(L_\theta)=-log(\prod_{i=1}^m[h_{\theta}(x^i)^{y^i}(1-h_{\theta}(x^i))^{1-y^i}])$
$=-\sum_{i=1}^mlog(h_{\theta}(x^i)^{y^i}(1-h_{\theta}(x^i))^{1-y^i})$
$=-\sum_{i=1}^m[log(h_{\theta}(x^i)^{y^i})+log(1-h_{\theta}(x^i))^{1-y^i}]$
$=-\sum_{i=1}^m[y^ilog(h_{\theta}(x^i))+(1-y^i)log(1-h_{\theta}(x^i))]$
爲求得損失函數的最小值，使用梯度下降法求解。

梯度下降法

損失函數爲：
$J(\theta) = -\frac{1}{m} \sum_{i=1}^m \left[ y^i log(h_\theta(x^i)) + (1 - y^i) log(1 - h_\theta(x^i)) \right]$
梯度下降公式
$\theta_j := \theta_j - \alpha \dfrac{\partial}{\partial \theta_j}J(\theta)$
代入損失函數推導：

其中 $\theta^Tx^{(i)}$ 對 $\theta_j$ 求偏導
$\theta^Tx^{(i)} = [\theta_0,\theta_1,...,\theta_j,...]*x^{(i)}$
$= (\theta_0x_0^{(i)}+\theta_1x_1^{(i)}+...+\theta_jx_j^{(i)}+...)$
結果爲 $x_j^{(i)}$

推導關鍵點

求導可以穿透常量係數，如 $(3x)' = 3(x)'$
以e爲底的對數爲自然對數，用ln表示， $(lnx)' = 1/x$
Sigmoid函數的導數爲 $\sigma'(x) = \sigma(x)(1 - \sigma(x))(x)'$

python實現

# 代碼來自《Python機器學習算法》一書
def sig(x):
    return 1.0 / (1 + np.exp(-x))

def lr_train_bgd(feature, label, maxCycle, alpha):
    '''利用梯度下降法訓練LR模型
    input:  feature(mat)特徵
            label(mat)標籤
            maxCycle(int)最大迭代次數
            alpha(float)學習率
    output: w(mat):權重
    '''
    n = np.shape(feature)[1]  # 特徵個數
    w = np.mat(np.ones((n, 1)))  # 初始化權重
    i = 0
    while i <= maxCycle:  # 在最大迭代次數的範圍內
        i += 1  # 當前的迭代次數
        h = sig(feature * w)  # 計算Sigmoid值
        err = label - h
        if i % 100 == 0:
            print "\t---------iter=" + str(i) + \
            " , train error rate= " + str(error_rate(h, label))
        w = w + alpha * feature.T * err  # 權重修正
    return w

代碼分析：

（1）feature爲訓練數據，偏置項的特徵值設爲1，數據如下：

(Pdb) feature[:10]
matrix([[1.   , 4.459, 8.225],
        [1.   , 0.043, 6.307],
        [1.   , 6.997, 9.313],
        [1.   , 4.755, 9.26 ],
        [1.   , 8.662, 9.768],
        [1.   , 7.174, 8.695],
        [1.   , 0.134, 1.969],
        [1.   , 2.959, 5.805],
        [1.   , 0.162, 2.596],
        [1.   , 3.996, 8.833]])

label爲標籤數據，值爲0或1，數據如下：

(Pdb) label[:10]
matrix([[0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.]])

maxCycle爲迭代次數，設爲1000；alpha爲學習率，設爲0.01

特徵個數爲3，權重值初始化爲1

(Pdb) p w
matrix([[1.],
        [1.],
        [1.]])

（2）h = sig(feature * w)爲預測值，對應表達式
$h_{\theta}(x^i)=\sigma(wx^i+b)=\frac{1}{1+e^{-(wx^i+b)}}$

(Pdb) p h[:10]
matrix([[0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.],
        [0.]])

（3）err = label - h對應表達式 $y^i-h_{\theta}(x^i)$

feature.T * err對應表達式 $x^i \cdot (y^i-h_{\theta}(x^i))$

（3）更新w權重值後，可以計算損失值，損失函數爲：
$J(\theta) = -\frac{1}{m} \sum_{i=1}^m \left[ y^i log(h_\theta(x^i)) + (1 - y^i) log(1 - h_\theta(x^i)) \right]$
代碼實現爲：

sum_err = 0.0
for i in xrange(m):
    y_i = label[i,0]
    sum_err -= (y_i * np.log(h[i,0]) + (1-y_i) * np.log(1-h[i,0]))
sum_err /= m

（4）訓練結束後，得到最終權重值

(Pdb) p w
matrix([[ 1.394],
        [ 4.527],
        [-4.794]])

預測

將測試數據代入預測函數 h = sig(feature * w)，得到預測值，若值<0.5預測爲負例，>=0.5爲正例。

(Pdb) p h[:10]
matrix([[0.   ],
        [0.   ],
        [0.002],
        [0.   ],
        [0.001],
        [0.   ],
        [0.001],
        [0.001],
        [0.   ],
        [0.   ]])

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

邏輯迴歸推導及python實例分析

數學公式

邏輯迴歸

損失函數

梯度下降法

推導關鍵點

python實現

預測

關於遊戲付費的一點想法

我通過CKA和CKS啦！

golang處理json轉義符 \u0026

redis cluster集羣批量執行命令工具

redis跳錶之golang實現(1)

邏輯迴歸推導及python實例分析

修改日期導致代碼每次都重新編譯的問題

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結