原理
在萬事開頭難那篇文中,介紹了一個初級的一層神經網,這是在最初級上面的follow up 版。
增強的點有:
1. Bernard新提出了cost function
2. weights的更新基於線性方程(linear activation function),而不是之前perceptron中的離散方程(unit step function)
Cost Function
Sum of Square Errors(SSE)
其中
理想的狀況是,目標方程是U型的。我們可以用梯度下降法找到最小的cost.
梯度下降gradient descent
feature scaling
當
實現
import numpy as np
class AdalineGD(object):
"""ADAptive LInear NEuron classifier.
Parameters
-----------
eta : float
Learning rate (between 0.0 and 1.0)
n_iter : int
Passes over the training dataset.
Attributes
-----------
w_ : 1d-array
Weights after fitting.
errors_ : list
Number of misclassifications in every epoch.
"""
def __init__(self, eta=0.01, n_iter=50):
self.eta = eta
self.n_iter = n_iter
def fit(self, X, y):
""" Fit training data.
Parameters
----------
X : {array-like}, shape = [n_samples, n_features]
Training vectors, where n_samples is the number of samples and
n_features is the number of features.
y : array-like, shape = [n_samples]Target values.
Returns
-------
self : object
"""
self.w_ = np.zeros(1 + X.shape[1])
self.cost_ = []
for i in range(self.n_iter):
output = self.net_input(X)
errors = (y - output)
#X.T.dot 叉乘 output:向量
self.w_[1:] += self.eta * X.T.dot(errors)
self.w_[0] += self.eta * errors.sum()
cost = (errors**2).sum() / 2.0
self.cost_.append(cost)
return self
def net_input(self, X):
"""Calculate net input"""
#np.dot 點乘 output:標量
return np.dot(X, self.w_[1:]) + self.w_[0]
def activation(self, X):
"""Compute linear activation"""
return self.net_input(X)
def predict(self, X):
"""Return class label after unit step"""
return np.where(self.activation(X) >= 0.0, 1, -1)
重點是weight的更新:
self.w_[1:] += self.eta * X.T.dot(errors)
self.w_[0] += self.eta * errors.sum()
和新添的activation function
def activation(self, X):
"""Compute linear activation"""
return self.net_input(X)
測試
>>> fig, ax = plt.subplots(nrows=1, ncols=2, figsize=(8, 4))
>>> ada1 = AdalineGD(eta=0.01, n_iter=50).fit(X, y)
>>> ax[0].plot(range(1, len(ada1.cost_) + 1),
... np.log10(ada1.cost_), marker='o')
>>> ax[0].set_xlabel('Epochs')
>>> ax[0].set_ylabel('log(Sum-squared-error)')
>>> ax[0].set_title('Adaline - Learning rate 0.01')
>>> ada2 = AdalineGD(eta=0.0001, n_iter=50).fit(X, y)
>>> ax[1].plot(range(1, len(ada2.cost_) + 1),
... ada2.cost_, marker='o')
>>> ax[1].set_xlabel('Epochs')
>>> ax[1].set_ylabel('Sum-squared-error')
>>> ax[1].set_title('Adaline - Learning rate 0.0001')
>>> plt.show()
左圖中,因爲learning rate步長太大,發生了overshoot,所以最後沒有降下來。
通過feature scaling, 在此也就是標準化特徵:
#減去平均數,除以標準差
>>> X_std = np.copy(X)
>>> X_std[:,0] = (X[:,0] - X[:,0].mean()) / X[:,0].std()
>>> X_std[:,1] = (X[:,1] - X[:,1].mean()) / X[:,1].std()
再將模型fit函數輸入改爲x_std:
ada.fit(X_std, y)