吳恩達機器學習課程-作業2-邏輯迴歸（python實現）

Machine Learning(Andrew) ex2-Logistic Regression

椰汁筆記

Logistic Regression

1.1 Visualizing the data

可視化數據的第一步就是讀入數據，數據形式和作業相同都是txt文件。

第一列爲第一門課程的成績，第二列爲第二門課程的成績。讀入方式仍然是按行讀入，進行分離轉換。發現這樣的數據讀入總是存在因此，這裏封裝一個讀取該種類型數據文件的函數。

def read_dataset(filename, type_tuple, separator=','):
    """
    從文件中讀入數據，文件的數據存儲應該是每組數據存在一行並用分隔符隔開
    返回列表存儲
    eg:
        1.1,2.1,3.1
        1.2,2.2,3.2

    parameters:
    ----------
    filename : str
            (包括路徑的）文件名
    type_tuple : tuple
            每一行數據的類型
    separator : str
            分隔符，默認爲','
    """
    f = open(filename, 'r')
    lines = f.readlines()

    data = []
    if len(type_tuple) != len(lines[0]) and len(type_tuple) == 1:
        for line in lines:
            line = line[:-1]`在這裏插入代碼片`
            line = line.split(sep=separator)
            row = []
            for col in line:
                row.append(type_tuple[0](col))
            data.append(row)

    elif len(type_tuple) == len(lines[0].split(sep=separator)):
        for line in lines:
            line = line[:-1]
            line = line.split(sep=separator)
            row = []
            for i in range(len(line)):
                row.append(type_tuple[i](line[i]))
            data.append(row)
    else:
        data = None
    return data

要畫出兩種數據的散點圖，我們需要根據y值將數據分爲admitted和not admitted兩組。同樣我也實現了一個通用的方法

def separate_dataset(data, col, boundary):
    """
    將數據按照某列進行二分類

    parameters:
    ----------
    data : ndarray
            一組數據存在一行
    col : int
            分類標準應用到的列號
    boundary : double
            分類邊界
    """
    data0 = np.array(data)
    data1 = np.array(data)
    dc0 = 0
    dc1 = 0
    for i in range(data.shape[0]):
        if data[i][col] < boundary:
            data1 = np.delete(data1, i - dc1, axis=0)
            dc1 += 1
        else:
            data0 = np.delete(data0, i - dc0, axis=0)
            dc0 += 1
    return data0, data1

將劃分好的兩組數據繪製散點圖，matplotlib.pyplot.xlim()設置x座標的範圍，matplotlib.pyplot.legend()用於顯示圖例。

    data = read_dataset("ex2data1.txt", (float, float, float), separator=',')
    data0, data1 = separate_dataset(data, -1, 0.5)
    plt.title("raw data scatter")
    plt.xlabel("exam1 score")
    plt.ylabel("exam2 score")
    plt.xlim((20, 110))
    plt.ylim((20, 110))
    na = plt.scatter(data0[..., 0], data0[..., 1], marker='x', c='b', label='not admitted')
    a = plt.scatter(data1[..., 0], data1[..., 1], marker='x', c='y', label='admitted')
    plt.legend(handles=[na, a], loc="upper right")

1.2.1 Warmup exercise: sigmoid function

$g(z)=\frac{1}{1+e^{-z}}$
sigmoid函數實現了將結果從R轉化到0-1，用來表示概率。實現sigmoid函數，通過numpy庫的支持很簡單

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

1.2.2 Cost function and gradient

這裏的損失函數不能繼續使用平方代價函數，因爲sigmoid函數會導致該損失函數變爲非凸函數（存在多個局部最優解），進而導致梯度下降出現效果不好的問題。因此邏輯迴歸的損失函數更換爲
$J(\theta)=\frac{1}{m}\sum_{i=1}^{m}[-y^{(i)}log(h_{\theta}(x^{(i)}))-(1-y^{(i)})log(1-h_{\theta}(x^{(i)}))]$
$h_{\theta}(x)=\theta^TX$
實現如下

def cost(theta, X, y):
    return np.mean((-y) * np.log(sigmoid(X.dot(theta))) - (1 - y) * np.log(1 - sigmoid(X.dot(theta))))

測試損失函數，將theta初始化爲全0

    data = np.array(data)
    X = np.insert(data[..., :2], 0, 1, axis=1)  # 記得添加x0
    y = data[..., -1]
    theta = np.zeros((3,))
    print(cost(theta, X, y))  # 0.6931471805599453

損失函數的梯度計算公式爲
$\frac{\partial J(\theta)}{\partial \theta_{j}}=\frac{1}{m}\sum_{i=1}^{m}(h_{\theta}(x^{(i)})-y^{(i)})x^{(i)}_j$
這個梯度的計算方法可以向量化，以實現並行計算，增加計算速度。這裏理解時要明確維度。
$\frac{\partial J(\theta)}{\partial \theta}=\frac{X^{T}(h_{\theta}(X)-y)}{m}$

def gradient(theta, X, y):
    return X.T.dot(sigmoid(X.dot(theta)) - y) / X.shape[0]

1.2.3 Learning parameters using fminunc

作業中讓我們使用高級的優化算法，而不是梯度下降去最小化目標函數。但是我們仍然可以使用原來的梯度下降去做。
定義梯度下降方法

def gradient_descent(theta, X, y, alpha, iterations):
    for i in range(iterations):
        theta -= alpha * gradient(theta, X, y)
    return theta

然後我們使用梯度下降

    alpha = 0.2
    iterations = 10000
    theta = np.zeros((3,))
    print(gradient_descent(theta, X, y, alpha, iterations))

這裏會發現出現報錯，原因是特徵值太大在計算過程中發生了溢出。

因此我們需要將特徵歸一化，在這裏採用zero-mean normalization，注意由於前面向添加了特徵0，在歸一化時不要操作x0，否則會導致錯誤。

    mean = np.mean(X[..., 1:], axis=0)
    std = np.std(X[..., 1:], axis=0, ddof=1)
    X[..., 1:] = (X[..., 1:] - mean) / std

重新進行梯度下降，得到theta參數爲

[1.71844349 4.01288964 3.74389058]

這裏同樣可以使用其他高級優化方式來實現，需要使用sicpy庫中的scipy.optimize.minimize實現，fun指損失函數，x0指的初始化的theta，args指損失函數的其他參數，method爲選擇的優化方法，jar指梯度計算函數。

    res = opt.minimize(fun=cost, x0=theta, args=(X, y), method='TNC', jac=gradient)

結果中的theta值在x中，可以看到這裏的結果與我們使用梯度下降是一樣的。不同點在於優化算法的速度更快，而且使用優化算法可以不將特徵歸一化（在這個例子中）

print(res.x)#[1.71844349 4.01288964 3.74389058]

我們畫出決策邊界，這裏的決策邊界就是曲線
$\theta^TX=0$
$\theta_0+\theta_1x_1+\theta_2x_2=0$
這裏需要注意一點，由於在計算之前將X進行了歸一化，當前的曲線應該是
$\theta_0+\theta_1\frac{x_1-mean_1}{std_1}+\theta_2\frac{x_2-mean_2}{std_2}=0$
繪製曲線時，第一步先生成連續的x1再計算x2，將上面的公式變形就可以得到
$x_2=mean_2-std_2\frac{\theta_0+\theta_1\frac{x_1-mean_1}{std_1}}{\theta_2}$

    # 畫出決策邊界
    plt.subplot(2, 2, 2)
    plt.scatter(data0[..., 0], data0[..., 1], marker='x', c='b', label="not admitted")
    plt.scatter(data1[..., 0], data1[..., 1], marker='x', c='y', label="admitted")
    x1 = np.arange(20, 110, 0.1)
    # 因爲進行了特徵縮放，所以計算y時需要還原特徵縮放
    x2 = mean[1] - std[1] * (theta[0] + theta[1] * (x1 - mean[0]) / std[0]) / theta[2]
    db = plt.plot(x1, x2, c='r', label="decision boundary")
    plt.xlim((20, 110))
    plt.ylim((20, 110))
    plt.title("decision boundary")
    plt.xlabel("exam1 score")
    plt.ylabel("exam2 score")
    plt.legend(handles=db, loc="upper right")

1.2.4 Evaluating logistic regression

驗證一下決策邊界是否正確，使用作業中給到的一組數據，像訓練數據一樣先將測試數據進行歸一化，這一步是必不可少的。

    # 測試優化結果
    test_x = np.array([45, 85])
    test_x = (test_x - mean) / std
    test_x = np.insert(test_x, 0, 1)
    print(sigmoid(test_x.dot(theta)))  #0.7763928918272246

接下來評價這個分類器，首先完成預測函數，設定一個閾值，按這個值去判斷01

def predict(theta, X):
    return [1 if i > 0.6 else 0 for i in sigmoid(X.dot(theta))]

下面我們使用sklearn這個庫中的方法進行評價，自行安裝

    # 評價
    print(classification_report(y, predict(res.x, X)))

精確度還是可以

Regularized logistic regression

Visualizing the data
這裏用到的數據格式和上面的格式相同，我們剛纔的讀取方法直接用在這裏，分離方法也一樣。

    data = read_dataset("ex2data2.txt", (float, float, float), separator=',')
    data0, data1 = separate_dataset(data, -1, 0.5)
    plt.subplot(1, 1, 1)
    test1 = plt.scatter(data0[..., 0], data0[..., 1], marker='x', c='b', label='reject')
    test2 = plt.scatter(data1[..., 0], data1[..., 1], marker='x', c='y', label='accepted')
    plt.legend(loc='upper right')
    plt.xlim((-1, 1.2))
    plt.ylim((-1, 1.2))
    plt.xlabel('microchips test 1')
    plt.xlabel('microchips test 2')
    plt.title('Plot of training data')
    plt.show()

Feature mapping

特徵只有兩個，從數據中看，顯然決策邊界不能是線性的。因此需要使用通過構造高次多項式，形成非線性的決策邊界。
這裏需要實現對給定的x1和x2，通過構造1到n次多項式，形成新的特徵。

def features_mapping(x1, x2, power):
    m = len(x1)
    features = np.zeros((m, 1))
    for sum_power in range(power):
        for x1_power in range(sum_power + 1):
            x2_power = sum_power - x1_power
            features = np.concatenate(
                (features, (np.power(x1, x1_power) * np.power(x2, x2_power)).reshape(m, 1)),
                axis=1)
    return np.delete(features, 0, axis=1)

Cost function and gradient

增加特徵會導致過擬合現象，這裏就需要使用正則化去懲罰，緩解過擬合現象。
$J(\theta)=\frac{1}{m}\sum_{i=1}^{m}[-y^{(i)}log(h_{\theta}(x^{(i)}))-(1-y^{(i)})log(1-h_{\theta}(x^{(i)}))]+\frac{\lambda}{2m}\sum_{j=1}^{n}\theta_j^2$
需要小心的是這裏，我們通常不懲罰theta0

def cost(theta, X, y, l):
    m = X.shape[0]
    part1 = np.mean(-y * np.log(sigmoid(X.dot(theta))) - (1 - y) * np.log(1 - sigmoid(X.dot(theta))))
    part2 = (l / (2 * m)) * np.sum(np.delete((theta * theta), 0, axis=0))
    return part1 + part2

測試一下損失函數

    features = features_mapping(data[..., 0], data[..., 1], 6)
    y = data[..., -1]
    theta = np.zeros(features.shape[-1])
    print(cost(theta, features, y, 1))#0.6931471805599454

梯度計算方法也相應的變化
$\frac{\partial J(\theta)}{\partial \theta_{0}}=\frac{1}{m}\sum_{i=1}^{m}(h_{\theta}(x^{(i)})-y^{(i)})x^{(i)}_j{ (for j = 0)}$
$\frac{\partial J(\theta)}{\partial \theta_{j}}=\frac{1}{m}\sum_{i=1}^{m}(h_{\theta}(x^{(i)})-y^{(i)})x^{(i)}_j+\frac{\lambda}{m}\theta_j\textrm{ (for j ≥ 1)}$

注意對於theta0的處理是特殊的

def gradient(theta, X, y, l):
    m = X.shape[0]
    part1 = X.T.dot((sigmoid(X.dot(theta)) - y)) / m
    part2 = (l / m) * theta
    part2[0] = 0
    return part1 + part2

一切準備就緒，開始訓練參數。繼續使用scipy.optimize.minimize()

   res = opt.minimize(fun=cost, x0=theta, args=(features, y, 1), method='TNC', jac=gradient)

這裏我們利用訓練數據評價一下結果，先定義預測結果判斷

def predict(theta, X):
    return [1 if i > 0.5 else 0 for i in sigmoid(X.dot(theta))]

print(classification_report(y, predict(res.x, features)))

預測的結果一般般

Plotting the decision boundary
畫出決策邊界，可以注意到這裏的決策邊界是非線性的，我們需要使用matplotlib.pyplot.contour()畫等高線的方法實現，在之前需要得到網格化的x,y座標，也就是通過numpy.meshgrid()得到。 ~~可以多查資料，我也每弄很透徹。~~

    x = np.linspace(-1, 1.2, 100)
    x1, x2 = np.meshgrid(x, x)
    z = features_mapping(x1.ravel(), x2.ravel(), 6)
    z = z.dot(res.x).reshape(x1.shape)
    db = plt.contour(x1, x2, z, 0, colors=['r'])
    plt.legend(loc='upper right')
    plt.show()

Optional (ungraded) exercises
當不正則化時（l=0），會出現過擬合的情況。這裏的效果和作業中的存在差異，是優化方法不同的原因。

當懲罰過大時(l=100)，會出現欠擬合的情況。

完整的代碼會同步在我的github

歡迎指正錯誤

吳恩達機器學習課程-作業2-邏輯迴歸（python實現）

Machine Learning(Andrew) ex2-Logistic Regression

Logistic Regression

Regularized logistic regression

.Net 8.0 下的新RPC，IceRPC之試試的新玩法"打洞"

完美替代postman的軟件

Vue mockjs mock.js

關於遊戲付費的一點想法

我通過CKA和CKS啦！

《最新出爐》系列入門篇-Python+Playwright自動化測試-42-強大的可視化追蹤利器Trace Viewer

大數據怎麼學？對大數據開發領域及崗位的詳細解讀，完整理解大數據開發領域技術體系

吳恩達機器學習課程-作業2-邏輯迴歸（python實現）

吳恩達機器學習課程-作業1-線性迴歸（python實現）

吳恩達機器學習課程-作業5-Bias vs Variance（python實現）

Jupyter notebook修改默認瀏覽器

吳恩達機器學習課程-作業8-異常檢測和推薦系統（python實現）

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結