常見的機器學習算法(二)邏輯迴歸

與線性迴歸不同,Logistic 迴歸沒有封閉解。但由於損失函數是凸函數,因此我們可以使用梯度下降法來訓練模型。

我們希望模型得到的目標值概率落在 0 到 1 之間。因此在訓練期間,我們希望調整參數,使得模型較大的輸出值對應正標籤(真實標籤爲 1),較小的輸出值對應負標籤(真實標籤爲 0  )。這在損失函數中表現爲如下形式:

對權重向量和偏置量,計算其對損失函數的梯度

更新權重和偏置值:

import numpy as np
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs

np.random.seed(123)
x, y_true = make_blobs(n_samples=1000, centers=2)#n_samples待生成的樣本總數,centers類別數,n_features每個樣本的特徵數
# print(x.shape)#(1000,2)
# print(y_true.shape)#一維(1000,)

'數據集'
fig = plt.figure(figsize=(8,6))
#plt.scatter散點圖;x,y是大小爲(n,)的數組,即繪製散點圖的數據點;c是顏色
plt.scatter(x[:,0], x[:, 1], c=y_true)#x[:,0]是數組所有行的第一列數據,x[:,1]是數組所有行的第二列數據
plt.title('Dataset')
plt.xlabel('First feature')
plt.ylabel('Second feature')
plt.show()

# Reshape targets to get column vector with shape (n_samples, 1)
y_true = y_true[:, np.newaxis]
# print(y_true.shape)#二維(1000,1)
x_train, x_test, y_train, y_test = train_test_split(x, y_true)
print('Shape of x_train: ', x_train.shape)
print('Shape of y_train: ', y_train.shape)
print('Shape of x_test: ', x_test.shape)
print('Shape of y_test: ', y_test.shape)

class logisticRegression:
    def __init__(self):
        pass

    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))

    def train(self, x, y_true, n_iters, l_r):
        n_samples, n_features = x.shape
        self.weight = np.zeros((n_features, 1))
        self.bias = 0
        costs = []

        for i in range(n_iters):
            y_predict = self.sigmoid(np.dot(x, self.weight) + self.bias)
            cost = (-1 / n_samples) * np.sum(y_true * np.log(y_predict) +
                                             (1 - y_true) * np.log(1 - y_predict))
            dw = (1 / n_samples) * np.dot(x.T, (y_predict - y_true))
            db = (1 / n_samples) * np.sum(y_predict - y_true)

            self.weight = self.weight - l_r * dw
            self.bias = self.bias - l_r * db

            costs.append(cost)
            if(i % 100 == 0):
                print('Cost after iteration {}:{}'.format(i, cost))
        return self.weight, self.bias, costs

    def predict(self, x):
        y_predict = self.sigmoid(np.dot(x, self.weight) + self.bias)
        y_predict_labels = [1 if elem > 0.5 else 0 for elem in y_predict]
        return np.array(y_predict_labels)[:, np.newaxis]

regressor = logisticRegression()
w_trained, b_trained, costs = regressor.train(x_train, y_train, n_iters=600, l_r=0.009)
fig = plt.figure(figsize=(8,6))
plt.plot(np.arange(600), costs)
plt.title('Development of cost over training')
plt.xlabel('Number of iterations')
plt.ylabel('Cost')
plt.show()

y_p_train = regressor.predict(x_train)
y_p_test = regressor.predict(x_test)
print('Train accuracy: ',
      (100 - np.mean(np.abs(y_p_train - y_train))), '%')
print('Test accuracy: ',
      (100 - np.mean(np.abs(y_p_test - y_test))), '%')

 

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章