Logistic迴歸摘記

1. Sigmoid函數

\qquad本文中Sigmoid函數用 S(x)S(x) 表示:

S(x)=11+ex\qquad\qquad S(x)=\dfrac{1}{1+e^{-x}}

\qquadSigmoid函數具有特殊的性質:[S(x)]=S(x)[1S(x)][S(x) ]^{'}=S(x) [ 1-S(x) ]
在這裏插入圖片描述

Sigmoid函數的曲線在中心 (x=0,y=0.5)(x=0,y=0.5 ) 附近增長速度較快,在兩端增長速度緩慢
其中,虛線爲階梯函數

2. Logistic Regression模型

\qquad如果將Sigmoid函數 S(x)S(x) 作爲線性模型 f(x)=wTx+bf(\boldsymbol x)=\boldsymbol{w}^T \boldsymbol x + b 的變換函數,那麼:

y(x)=S[f(x)]=11+e(wTx+b)\qquad\qquad y(\boldsymbol{x})=S [ f(\boldsymbol x) ]=\dfrac{1}{1+e^{-(\boldsymbol{w}^{T}\boldsymbol{x}+b)}}

\qquad對於某個樣本 x\boldsymbol{x^{\ast}} 來說,其輸出值爲 y=y(x)y=y(\boldsymbol x^{\ast}),可得到:

ln(y1y)=wTx+b\qquad\qquad \ln\left( \dfrac{y}{1-y}\right)=\boldsymbol{w}^{T}\boldsymbol{x^{\ast}}+b

\qquad如果將 yy 看作是樣本 x\boldsymbol{x^{\ast}}正例的可能性(概率),將 1y1-y 看作是樣本 x\boldsymbol{x^{\ast}}負例的可能性(概率),兩者的比率取對數 ln(y1y)\ln\left( \dfrac{y}{1-y}\right) 反映了對樣本 x\boldsymbol{x^{\ast}} 進行“線性分類”的情況(如下圖所示):

1)\qquad1) 如果 y=0.5y=0.5,那麼 1y=0.51-y=0.5ln(y1y)=0\ln\left( \dfrac{y}{1-y}\right)=0,此時 wTx+b=0\boldsymbol{w}^{T}\boldsymbol{x^{\ast}}+b=0

\qquad 從線性模型的角度來說,樣本 x\boldsymbol{x^{\ast}} 正好處在分界面(紅色直線)上,作爲正例和負例的可能性是相同的。

2)\qquad2) 如果 y>0.5y>0.5,那麼 1y<0.51-y<0.5ln(y1y)>0\ln\left( \dfrac{y}{1-y}\right)>0,此時 wTx+b>0\boldsymbol{w}^{T}\boldsymbol{x^{\ast}}+b>0

\qquad 這說明了,樣本 x\boldsymbol{x^{\ast}} 處在分界面的上側。

3)\qquad3) 如果 y<0.5y<0.5,那麼 1y>0.51-y>0.5ln(y1y)<0\ln\left( \dfrac{y}{1-y}\right)<0,此時 wTx+b<0\boldsymbol{w}^{T}\boldsymbol{x^{\ast}}+b<0

\qquad 這說明了,樣本 x\boldsymbol{x^{\ast}} 處在分界面的下側。
在這裏插入圖片描述

通過Sigmoid函數可以將線性模型 y=wTx+by=\boldsymbol{w}^{T}\boldsymbol{x^{\ast}}+b 的輸出值 yy 轉化爲 [0,1][0,1] 之間
 
如果事件發生的概率爲 pp,那麼該事件發生的機率(odds)定義爲 p1p\dfrac{p}{1-p},該事件的對數機率(log odds)定義爲 ln(p1p)\ln\left(\dfrac{p}{1-p}\right)

\qquad如果採用變量 c=1c=1 表示上圖中的 R1\mathcal R_1 區域,用 c=0c=0 表示上圖中的 R2\mathcal R_2 區域,那麼可將 y(x)y(\boldsymbol x) 的值視爲類後驗概率 ,即:

p(c=1x)=y(x)=11+e(wTx+b)\qquad\qquad p(c=1|\boldsymbol{x})=y(\boldsymbol{x})=\dfrac{1}{1+e^{-(\boldsymbol{w}^{T}\boldsymbol{x}+b)}}

\qquad此時,關於 p(c=1x)p(c=1|\boldsymbol{x}) 的對數機率就是線性模型:

lnp(c=1x)p(c=0x)=wTx+b\qquad\qquad\ln \dfrac{p(c=1|\boldsymbol{x})}{p(c=0|\boldsymbol{x})}=\boldsymbol{w}^T\boldsymbol{x}+b

  • x\boldsymbol{x}正例 (c=1)(c=1) 的概率:

p(c=1x)=11+e(wTx+b)\qquad\qquad p(c=1|\boldsymbol{x})=\dfrac{1}{1+e^{-(\boldsymbol{w}^{T}\boldsymbol{x}+b)}}

\qquad 線性函數 wTx+b\boldsymbol{w}^{T}\boldsymbol{x}+b 的值越接近 ++\infty,概率值越接近 11
\qquad 線性函數 wTx+b\boldsymbol{w}^{T}\boldsymbol{x}+b 的值越接近 -\infty,概率值越接近 00

  • x\boldsymbol{x}負例 (c=0)(c=0) 的概率:

p(c=0x)=1p(y=1x)=e(wTx+b)1+e(wTx+b)=11+e(wTx+b)\qquad\qquad\begin{aligned} p(c=0|\boldsymbol{x})&=1-p(y=1|\boldsymbol{x})\\ &=\dfrac{e^{-(\boldsymbol{w}^{T}\boldsymbol{x}+b)}}{1+e^{-(\boldsymbol{w}^{T}\boldsymbol{x}+b)}}\\ &=\dfrac{1}{1+e^{(\boldsymbol{w}^{T}\boldsymbol{x}+b)}}\end{aligned}

\qquad 線性函數 wTx+b\boldsymbol{w}^{T}\boldsymbol{x}+b 的值越接近 ++\infty,概率值越接近 00
\qquad 線性函數 wTx+b\boldsymbol{w}^{T}\boldsymbol{x}+b 的值越接近 -\infty,概率值越接近 11

\qquad顯然,對於新的輸入樣本 x\boldsymbol{x^{\ast}} 而言,按照最大後驗概率準則:如果 p(y=1x)>p(y=0x)p(y=1|\boldsymbol{x^{\ast}})>p(y=0|\boldsymbol{x^{\ast}}),則認爲 x\boldsymbol{x^{\ast}} 屬於 R1R_{1};如果 p(y=1x)<p(y=0x)p(y=1|\boldsymbol{x^{\ast}})<p(y=0|\boldsymbol{x^{\ast}}),則認爲 x\boldsymbol{x^{\ast}} 屬於 R2R_{2}
\qquad

3. 模型的參數估計

\qquad假設訓練樣本爲{(xi,ci)}i=1N\{ ( \boldsymbol{x}_{i},c_{i}) \} _{i=1}^N,其中 xiRn,ci{0,1}\boldsymbol{x}_{i}\in R^{n},c_{i}\in \{0,1\},採用最大似然估計來求模型的參數 (w,b)(\boldsymbol{w},b)

1)\qquad1)y(x)=p(c=1x)y(\boldsymbol{x})=p(c=1|\boldsymbol{x}),訓練樣本集的似然函數可表示爲:

L(w,b)=i=1Ny(xi)ci[1y(xi)]1ci\qquad\qquad L(\boldsymbol{w},b)=\displaystyle\prod_{i=1}^N y(\boldsymbol{x}_{i})^{c_{i}}\left[ 1-y(\boldsymbol{x}_{i})\right] ^{1-c_{i}}

2)\qquad2) 對數似然函數可表示爲:

lnL(w,b)=i=1N{ciln[y(xi)]+(1ci)ln[1y(xi)]}=i=1N{cilny(xi)1y(xi)+ln[1y(xi)]}=i=1N{ci(wTxi+b)ln[1+e(wTxi+b)]}\qquad\qquad\begin{aligned} \ln L(\boldsymbol{w},b)&=\displaystyle\sum_{i=1}^N \{ c_{i}\ln\left[ y\left( \boldsymbol{x}_{i}\right) \right] +(1-c_{i})\ln\left[ 1-y\left( \boldsymbol{x}_{i}\right) \right] \}\\ &=\displaystyle\sum_{i=1}^N\left\{ c_{i}\ln\dfrac{ y\left( \boldsymbol{x}_{i}\right)}{1-y\left( \boldsymbol{x}_{i}\right)} +\ln\left[ 1-y\left( \boldsymbol{x}_{i}\right) \right] \right\} \\ &=\displaystyle\sum_{i=1}^N\left\{ c_{i}\left(\boldsymbol{w}^{T}\boldsymbol{x}_{i}+b\right)-\ln\left[ 1+e^{\left(\boldsymbol{w}^{T}\boldsymbol{x}_{i}+b\right)} \right] \right\} \\ \end{aligned}

3)\qquad3) 若令 β=[wT,b]T, x^i=[xiT,1]T\boldsymbol{\beta}=[\boldsymbol{w}^T,b]^{T},\ \hat{\boldsymbol{x}}_{i}=[\boldsymbol{x}_{i}^T,1]^{T},那麼線性模型 wTxi+b=βTx^i\boldsymbol{w}^{T}\boldsymbol{x}_{i}+b=\boldsymbol{\beta}^{T}\hat{\boldsymbol{x}}_{i},從而有

lnL(β)=i=1N[ciβTx^iln(1+eβTx^i)]\qquad\qquad \ln L(\boldsymbol{\beta})=\displaystyle\sum_{i=1}^N \left[ c_{i}\boldsymbol{\beta}^{T}\hat{\boldsymbol{x}}_{i}-\ln\left( 1+e^{\boldsymbol{\beta}^{T}\hat{\boldsymbol{x}}_{i}} \right) \right]

\qquad通過最大似然估計,可以估計出Logistic Regression模型的參數 β=[wT,b]T\boldsymbol{\beta}=[\boldsymbol{w}^{T},b]^{T}

4. 模型學習的最優化算法

\qquad一般取“負的對數似然函數”作爲損失函數,即:l(β)=lnL(w,b)=lnL(β)l(\boldsymbol{\beta})=-\ln L(\boldsymbol{w},b)=-\ln L(\boldsymbol{\beta})。最大化似然函數,相當於最小化損失函數 l(β)l(\boldsymbol{\beta})

l(β)=lnL(β)=i=1N[ciβTx^iln(1+eβTx^i)]\qquad\qquad l(\boldsymbol{\beta})=-\ln L(\boldsymbol{\beta})=-\displaystyle\sum_{i=1}^N \left[ c_{i}\boldsymbol{\beta}^{T}\hat{\boldsymbol{x}}_{i}-\ln\left( 1+e^{\boldsymbol{\beta}^{T}\hat{\boldsymbol{x}}_{i}} \right) \right]

\qquad由於 l(β)l(\boldsymbol{\beta}) 是關於 β\boldsymbol{\beta} 的高階可導連續凸函數,可採用數值優化方法對 β\boldsymbol{\beta} 進行求解。
\qquad

4.1 梯度下降法

\qquad採用梯度下降法求解時,需要求出“負梯度方向”作爲下降方向:

l(β)β=i=1N(cix^i11+eβTx^ieβTx^ix^i)=i=1N(cieβTx^i1+eβTx^i)x^i=i=1N[ciy(xi)]x^i (1)\qquad\qquad \begin{aligned} \dfrac{\partial l(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}&=-\displaystyle\sum_{i=1}^N \left( c_{i}\hat{\boldsymbol{x}}_{i}-\dfrac{1}{1+e^{\boldsymbol{\beta}^{T}\hat{\boldsymbol{x}}_{i}}}e^{\boldsymbol{\beta}^{T}\hat{\boldsymbol{x}}_{i}}\hat{\boldsymbol{x}}_{i} \right)\\ &=-\displaystyle\sum_{i=1}^N \left( c_{i}-\dfrac{e^{\boldsymbol{\beta}^{T}\hat{\boldsymbol{x}}_{i}}}{1+e^{\boldsymbol{\beta}^{T}\hat{\boldsymbol{x}}_{i}}} \right)\hat{\boldsymbol{x}}_{i} \\ &=-\displaystyle\sum_{i=1}^N [ c_{i}-y(\boldsymbol{x}_{i}) ]\hat{\boldsymbol{x}}_{i} \qquad\qquad\qquad\ (1)\\ \end{aligned}

\qquad由於參數 β=[wT,b]T=[w1,,wn,b]T\boldsymbol{\beta}=[\boldsymbol{w}^T,b]^{T}=[w_1,\cdots,w_n,b]^T 以及 x^i=[xiT,1]T=[xi(1),,xi(n),1]T\hat\boldsymbol{x}_{i}=[\boldsymbol{x}_{i}^T,1]^T=[x_i^{(1)},\cdots,x_i^{(n)},1]^T,公式(1)實際上爲:

{   l(β)w=i=1N[ciy(xi)]xi  (2)   l(β)b=i=1N[ciy(xi)] (3)\qquad\qquad\begin{cases}\ \ \ \dfrac{\partial l(\boldsymbol{\beta})}{\partial \boldsymbol w}=-\displaystyle\sum_{i=1}^N [ c_{i}-y\left( \boldsymbol{x}_{i}\right) ]\boldsymbol{x}_{i}\qquad\qquad\ \ (2) \\ \\ \ \ \ \dfrac{\partial l(\boldsymbol{\beta})}{\partial b}=-\displaystyle\sum_{i=1}^N\left[ c_{i}-y\left( \boldsymbol{x}_{i}\right) \right] \qquad\qquad(3) \end{cases}

\qquad若考慮 xi=[xi(1),,xi(n)]T\boldsymbol{x}_{i}=[x_i^{(1)},\cdots,x_i^{(n)}]^T 的每一個分量 xi(j)x_{i}^{(j)},公式(2)還可以表示爲:

l(β)wj=i=1N[ciy(xi)]xi(j)\qquad\qquad \dfrac{\partial l(\boldsymbol{\beta})}{\partial w_{j}}=-\displaystyle\sum_{i=1}^N [ c_{i}-y\left( \boldsymbol{x}_{i}\right) ]x_{i}^{(j)}

\qquad對數似然函數的梯度可以表示爲:

l(β)β=[l(β)w1,,l(β)wn,l(β)b]T\qquad\qquad \dfrac{\partial l(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}=\left[\dfrac{\partial l(\boldsymbol{\beta})}{\partial w_1},\cdots,\dfrac{\partial l(\boldsymbol{\beta})}{\partial w_n},\dfrac{\partial l(\boldsymbol{\beta})}{\partial b} \right]^{T}

\qquad因此,採用梯度下降法的權值更新公式爲(α\alpha 爲步長):

βt+1=βtαl(β)β\qquad\qquad \boldsymbol{\beta}^{t+1}=\boldsymbol{\beta}^{t}-\alpha\dfrac{\partial l(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}
\qquad

4.2 牛頓法

\qquad採用牛頓法求解最優化問題時,是在搜索點取泰勒級數的二階近似的導數爲 00。除了要求梯度之外,還需要求出 hessianhessian 矩陣的逆。

\qquad由於已經求出: l(β)β=i=1N[ciy(xi)]x^i\ \dfrac{\partial l(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}=-\displaystyle\sum_{i=1}^N [ c_{i}-y(\boldsymbol{x}_{i}) ]\hat{\boldsymbol{x}}_{i}

\qquad那麼,hessianhessian 矩陣就爲:

βT(l(β)β)=βT(i=1N[ciy(xi)]x^i)=βT(i=1Ny(xi)x^i)=i=1Ny(xi)[1y(xi)]x^iβT(βTx^i)=i=1Ny(xi)[1y(xi)]x^ix^iT\qquad\qquad \begin{aligned} \dfrac{\partial}{\partial \boldsymbol\beta^T}\left(\dfrac{\partial l(\boldsymbol{\beta})}{\partial \boldsymbol\beta}\right) &=\dfrac{\partial}{\partial \boldsymbol\beta^T}\left(-\displaystyle\sum_{i=1}^N [ c_{i}-y(\boldsymbol{x}_{i}) ]\hat{\boldsymbol{x}}_{i}\right)\\ &=\dfrac{\partial}{\partial \boldsymbol\beta^T}\left(\displaystyle\sum_{i=1}^Ny(\boldsymbol{x}_{i})\hat\boldsymbol{x}_i\right)\\ &=\displaystyle\sum_{i=1}^Ny(\boldsymbol{x}_{i})[1-y(\boldsymbol{x}_{i})]\hat\boldsymbol{x}_i\dfrac{\partial}{\partial \boldsymbol\beta^T}\left(\boldsymbol{\beta}^{T}\hat{\boldsymbol{x}}_{i}\right)\\ &=\displaystyle\sum_{i=1}^Ny(\boldsymbol{x}_{i})[1-y(\boldsymbol{x}_{i})]\hat\boldsymbol{x}_i\hat{\boldsymbol{x}}_{i}^T\\ \end{aligned}

\qquad因此,採用牛頓法的權值更新公式爲:

βt+1=βt(2l(β)ββT)1l(β)β\qquad\qquad \boldsymbol{\beta}^{t+1}=\boldsymbol{\beta}^{t}-\left(\dfrac{\partial^2 l(\boldsymbol\beta)}{\partial \boldsymbol\beta\partial \beta^T}\right)^{-1}\dfrac{\partial l(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}

\qquad

5. 模型訓練步驟

\qquad若採用梯度下降法來求解模型的參數,則訓練步驟如下:

1)\qquad1) 隨機選擇 β=[wT,b]T\boldsymbol{\beta}=[\boldsymbol{w}^T,b]^{T} 的初始值 β0\boldsymbol{\beta}^{0}

2)\qquad2) 選擇步長 α\alpha,迭代計算下列公式,直到滿足終止條件

l(β)β=i=1N[ciy(xi)]x^i=i=1N[ciy(xi)][xi1]\qquad\qquad \begin{aligned} \dfrac{\partial l(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}} &=-\displaystyle\sum_{i=1}^N [ c_{i}-y(\boldsymbol{x}_{i}) ]\hat{\boldsymbol{x}}_{i}\\ &=-\displaystyle\sum_{i=1}^N [ c_{i}-y(\boldsymbol{x}_{i}) ]\left[\begin{matrix}\boldsymbol{x}_{i}\\ 1\end{matrix}\right]\\ \end{aligned}

βt+1=βtαl(β)β\qquad\qquad \boldsymbol{\beta}^{t+1}=\boldsymbol{\beta}^{t}-\alpha\dfrac{\partial l(\boldsymbol{\beta})}{\partial \boldsymbol{\beta}}
\qquad

6. 實現代碼(二分類)

1) 定義Sigmoid函數

def sigmoid(x):
    '''Sigmoid function
    '''
    return 1.0/(1 + np.exp(-x))

2) 讀取訓練/測試集數據函數

假設在二維平面R2R^{2}上生成的數據集的格式爲 (xi,yi)=(xi(1),xi(2),yi), yi{0,1}(\boldsymbol{x}_{i},y_{i})=(x_{i}^{(1)},x_{i}^{(2)},y_{i}), \ y_{i}\in\{0,1\} 的形式:

3.562302,25.329208,1.00000024.268267,1.272092,1.00000025.405790,8.463017,1.0000006.908775,23.298889,1.00000040.621010,25.134052,0.0000009.305521,14.983097,1.00000020.041330,25.381725,0.00000037.298540,26.767307,0.00000035.856177,31.080316,0.00000017.976889,4.244106,1.0000003.562302,25.329208,1.000000\newline -24.268267,1.272092,1.000000\newline 25.405790,8.463017,1.000000\newline -6.908775,23.298889,1.000000\newline 40.621010,-25.134052,0.000000\newline -9.305521,14.983097,1.000000\newline 20.041330,-25.381725,0.000000\newline 37.298540,-26.767307,0.000000\newline 35.856177,-31.080316,0.000000\newline -17.976889,4.244106,1.000000\newline \cdot\cdot\cdot\cdot\cdot\cdot

生成具有兩個中心的二維高斯分佈的數據集:

def gen_gausssian(mean1, mean2, cov1, cov2, num):
    '''generate 2-d gaussian dataset with 2 clusters
'''
    # postive data
    data1 = np.random.multivariate_normal(mean1,cov1,num)
    label1 = np.ones((1,num)).T
    data_pos = np.append(data1,label1,axis=1)
    # negative data
    data2 = np.random.multivariate_normal(mean2,cov2,num)
    label2 = np.zeros((1,num)).T
    data_neg = np.append(data2,label2,axis=1)
    # all data
    data = np.append(data_pos,data_neg,axis=0)
    # shuffled data
    shuffle_data = np.random.permutation(data)
	
    # scatter plot
    x1,y1 = data1.T
    x2,y2 = data2.T
    plt.scatter(x1,y1,c='r',s=3)
    plt.plot(mean1[0],mean1[1],'ko')    
    plt.scatter(x2,y2,c='b',s=3)
    plt.plot(mean2[0],mean2[1],'ko')
    plt.axis()
    plt.title("2-d gaussian dataset with 2 clusters")
    plt.xlabel("x")
    plt.ylabel("y")
    plt.show()           
    
    np.savetxt('gaussdata.txt', shuffle_data, fmt='%f',delimiter=',')
    return shuffle_data, data_pos, data_neg

用散點圖表示:
在這裏插入圖片描述
讀取以 (xi,yi)=(xi(1),xi(2),yi), yi{0,1}(\boldsymbol{x}_{i},y_{i})=(x_{i}^{(1)},x_{i}^{(2)},y_{i}), \ y_{i}\in\{0,1\} 格式保存的訓練數據,返回numpy數組形式:

def load_data(filename):
    '''Load data of training or testing set
    '''
    tdata = []
    with open(filename) as f:
        while True:
            line = f.readline()
            if not line:
                break
            line = line.split(',')         
            tdata.append([float(item) for item in line])
            
    f.close()
    return np.array(tdata)

3) 迭代計算公式(2)(2),並顯示每次迭代後的損失函數值

def lr_train(xhat,c,alpha,num):
    
    beta = np.random.rand(3,1)    
    for i in range(num):
        yx = sigmoid(np.dot(xhat, beta))
        beta = beta + alpha * np.dot(xhat.T, (c - yx))
        print('#'+str(i)+',training loss:'+str(train_loss(c, yx)))        
    
    return beta

由公式 lnL(w,b)=i=1N{ciln[y(xi)]+(1ci)ln[1y(xi)]}\ -\ln L(\boldsymbol{w},b)=-\displaystyle\sum_{i=1}^N \{ c_{i}\ln\left[ y\left( \boldsymbol{x}_{i}\right) \right] +(1-c_{i})\ln\left[ 1-y\left( \boldsymbol{x}_{i}\right) \right] \}

計算損失函數(誤差值)

def train_loss(c, yx):
    
    err = 0.0
    for i in range(len(yx)):
        if yx[i,0] > 0 and (1 - yx[i,0]) > 0:
            err -= c[i,0] * np.log(yx[i,0]) + (1-c[i,0])*np.log(1-yx[i,0])

    return err

主程序:

    # 生成2200個數據,前2000個作爲訓練集,後200個作爲測試集
    mean1 = [3,-1]
    cov1 = [[5,0],[0,10]]        
    mean2 = [-5,7]
    cov2 = [[10,0],[0,5]]
    data,data_pos,data_neg = gen_gausssian(mean1,mean2,cov1,cov2,1100)
    # 讀取前2000個數據進行訓練
    training_data = data
    tmp1 = training_data[0:2000,0:2]
    tmp2 = np.ones((2000,1))
    xhat = np.concatenate((tmp1,tmp2),axis=1)
    target = training_data[0:2000,2:]
    # 訓練數據集100次,步長0.01
    beta = lr_train(xhat,target,0.01,100)
    print('beta:\n', beta)
    # 測試200個訓練數據
    tmp1 = training_data[2000:2200,0:2]
    tmp2 = np.ones((200,1))
    testing_data = np.concatenate((tmp1,tmp2),axis=1)
    target = training_data[2000:2200,2:]
    y1 = classification(testing_data, beta)    
    print(np.abs(y1-target).T)

對一個大小爲2000的數據進行訓練,可得到輸出結果爲:
#0,training loss:2767.7605301149197
#1,training loss:28706.32704095256
#2,training loss:24304.21071966826
#3,training loss:20729.807928831706
#4,training loss:18031.980567667095
#5,training loss:15793.907613945637
#6,training loss:13876.408972848896
#7,training loss:12260.25776604957
#8,training loss:10857.914022333194
#9,training loss:9702.173760769088
#10,training loss:8739.995403737194
#11,training loss:7909.116254144592
#12,training loss:7237.015743718265
#13,training loss:6581.515845960798
#14,training loss:6155.195323818418
#15,training loss:5782.624246205244
#16,training loss:5451.120323877512
#17,training loss:5159.985309063984
#18,training loss:4921.653117909279
#19,training loss:4728.055485820308
#20,training loss:4546.101559000789
#21,training loss:4368.003415240011
#22,training loss:4196.188712568878
#23,training loss:4032.1962049440162
#24,training loss:3876.771728177838
#25,training loss:3694.5715060625985
#26,training loss:3554.8126869561006
#27,training loss:3418.8100373192524
#28,training loss:3321.3029188728215
#29,training loss:3189.8131265721095
#30,training loss:3060.4306284382133
#31,training loss:2932.529506577584
#32,training loss:2807.0843716420854
#33,training loss:2684.4698423911955
#34,training loss:2564.789169175422
#35,training loss:2447.909093126709
#36,training loss:2333.712055985516
#37,training loss:2222.305120198585
#38,training loss:2114.0629245811747
#39,training loss:2009.9696271327145
#40,training loss:1911.4641438101942
#41,training loss:1818.4131000336629
#42,training loss:1731.1576524394175
#43,training loss:1648.321160807572
#44,training loss:1568.548376402433
#45,training loss:1491.2975705058457
#46,training loss:1416.3652001741157
#47,training loss:1343.7359069149327
#48,training loss:1273.1915049002964
#49,training loss:1204.2529637870934
#50,training loss:1136.9266223350025
#51,training loss:1071.3457943359633
#52,training loss:1007.323134162851
#53,training loss:944.9219846916478
#54,training loss:885.1816608689702
#55,training loss:899.9599868299116
#56,training loss:845.0057775193546
#57,training loss:793.5441317445959
#58,training loss:745.7136807432933
#59,training loss:701.6553680843865
#60,training loss:696.1322808438866
#61,training loss:689.6549290071879
#62,training loss:648.194522863791
#63,training loss:608.4750616604265
#64,training loss:570.7085584125251
#65,training loss:535.7643996771293
#66,training loss:504.1081114497143
#67,training loss:475.508191984496
#68,training loss:450.66041076529723
#69,training loss:429.67911785665814
#70,training loss:411.2956082830516
#71,training loss:394.87591024838343
#72,training loss:380.24926997738123
#73,training loss:403.36316867372153
#74,training loss:391.38976162245194
#75,training loss:381.3801916299097
#76,training loss:372.99093649597427
#77,training loss:366.0959147066188
#78,training loss:360.56881602093216
#79,training loss:355.8694556375197
#80,training loss:351.9153236373713
#81,training loss:348.4531107835549
#82,training loss:345.4202325927137
#83,training loss:342.7408477041635
#84,training loss:340.38193613578653
#85,training loss:338.2928279212097
#86,training loss:336.440189142936
#87,training loss:334.7845603199353
#88,training loss:333.29353615870866
#89,training loss:331.9350276518051
#90,training loss:330.68392791191314
#91,training loss:329.51840299766616
#92,training loss:328.4199642611809
#93,training loss:327.3721997842567
#94,training loss:326.36511808367675
#95,training loss:325.3868163444934
#96,training loss:324.43080556455044
#97,training loss:323.49135041237633
#98,training loss:322.5630405032738
#99,training loss:321.64316225672036

beta:
[[ 5.96987205]
[-6.41668657]
[30.84845393]]

這裏的beta值就是用梯度下降法所求得β=[wT,b]T\boldsymbol{\beta}=[\boldsymbol{w}^{T},b]^{T}的結果。

def classification(testing_data, beta):
    y = sigmoid(np.dot(testing_data, beta))    
    for i in range(len(y)):
        if y[i,0] < 0.5:
            y[i,0] = 0.0
        else:
            y[i,0] = 1.0
    return y

判別結果:200個測試數據,有2個數據被錯誤分類
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 1. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0.]]

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章