統計學習方法之感知機

《統計學習方法》系列筆記的第一篇，對應原著第二章。大量引用原著講解，加入了自己的理解。對書中算法採用Python實現，並用Matplotlib可視化了動畫出來，應該算是很硬派了。一套乾貨下來，很是辛苦，要是能堅持下去就好。

概念

感知機是二分類模型，輸入實例的特徵向量，輸出實例的±類別。

感知機模型

定義

假設輸入空間是，輸出空間是，x和y分屬這兩個空間，那麼由輸入空間到輸出空間的如下函數：

稱爲感知機。其中，w和b稱爲感知機模型參數，叫做權值或權值向量，叫做偏置，w·x表示向量w和x的內積。sign是一個函數：

感知機的幾何解釋是，線性方程

將特徵空間劃分爲正負兩個部分：

這個平面（2維時退化爲直線）稱爲分離超平面。

感知機學習策略

數據集的線性可分性

定義

給定數據集

其中如果存在某個超平面S

能夠完全正確地將正負實例點全部分割開來，則稱T線性可分，否則稱T線性不可分。

感知機學習策略

假定數據集線性可分，我們希望找到一個合理的損失函數。

一個樸素的想法是採用誤分類點的總數，但是這樣的損失函數不是參數w，b的連續可導函數，不可導自然不能把握函數的變化，也就不易優化（不知道什麼時候該終止訓練，或終止的時機不是最優的）。

另一個想法是選擇所有誤分類點到超平面S的總距離。爲此，先定義點x0到平面S的距離：

分母是w的L2範數，所謂L2範數，指的是向量各元素的平方和然後求平方根（長度）。這個式子很好理解，回憶中學學過的點到平面的距離：

此處的點到超平面S的距離的幾何意義就是上述距離在多維空間的推廣。

又因爲，如果點i被誤分類，一定有

成立，所以我們去掉了絕對值符號，得到誤分類點到超平面S的距離公式：

假設所有誤分類點構成集合M，那麼所有誤分類點到超平面S的總距離爲

分母作用不大，反正一定是正的，不考慮分母，就得到了感知機學習的損失函數：

感知機學習算法

原始形式

感知機學習算法是對以下最優化問題的算法：

感知機學習算法是誤分類驅動的，先隨機選取一個超平面，然後用梯度下降法不斷極小化上述損失函數。損失函數的梯度由：

給出。所謂梯度，是一個向量，指向的是標量場增長最快的方向，長度是最大變化率。所謂標量場，指的是空間中任意一個點的屬性都可以用一個標量表示的場（個人理解該標量爲函數的輸出）。

隨機選一個誤分類點i，對參數w，b進行更新：

上式是學習率。損失函數的參數加上梯度上升的反方向，於是就梯度下降了。所以，上述迭代可以使損失函數不斷減小，直到爲0。於是得到了原始形式的感知機學習算法：

對於此算法，使用下面的例子作爲測試數據：

給出Python實現和可視化代碼如下：

感知機算法代碼

終於到了最激動人心的時刻了，有了上述知識，就可以完美地可視化這個簡單的算法：

# -*- coding:utf-8 -*-
# Filename: train2.1.py
# Author：hankcs
# Date: 2015/1/30 16:29
import copy
from matplotlib import pyplot as plt
from matplotlib import animation
training_set = [[(3, 3), 1], [(4, 3), 1], [(1, 1), -1]]
w = [0, 0]
b = 0
history = []
def update(item):
"""
update parameters using stochastic gradient descent
:param item: an item which is classified into wrong class
:return: nothing
"""
global w, b, history
w[0] += 1 * item[1] * item[0][0]
w[1] += 1 * item[1] * item[0][1]
b += 1 * item[1]
print w, b
history.append([copy.copy(w), b])
# you can uncomment this line to check the process of stochastic gradient descent
def cal(item):
"""
calculate the functional distance between 'item' an the dicision surface. output yi(w*xi+b).
:param item:
:return:
"""
res = 0
for i in range(len(item[0])):
res += item[0][i] * w[i]
res += b
res *= item[1]
return res
def check():
"""
check if the hyperplane can classify the examples correctly
:return: true if it can
"""
flag = False
for item in training_set:
if cal(item) <= 0:
flag = True
update(item)
# draw a graph to show the process
if not flag:
print "RESULT: w: " + str(w) + " b: " + str(b)
return flag
if __name__ == "__main__":
for i in range(1000):
if not check(): break
# first set up the figure, the axis, and the plot element we want to animate
fig = plt.figure()
ax = plt.axes(xlim=(0, 2), ylim=(-2, 2))
line, = ax.plot([], [], 'g', lw=2)
label = ax.text([], [], '')
# initialization function: plot the background of each frame
def init():
line.set_data([], [])
x, y, x_, y_ = [], [], [], []
for p in training_set:
if p[1] > 0:
x.append(p[0][0])
y.append(p[0][1])
else:
x_.append(p[0][0])
y_.append(p[0][1])
plt.plot(x, y, 'bo', x_, y_, 'rx')
plt.axis([-6, 6, -6, 6])
plt.grid(True)
plt.xlabel('x')
plt.ylabel('y')
plt.title('Perceptron Algorithm (www.hankcs.com)')
return line, label
# animation function. this is called sequentially
def animate(i):
global history, ax, line, label
w = history[i][0]
b = history[i][1]
if w[1] == 0: return line, label
x1 = -7
y1 = -(b + w[0] * x1) / w[1]
x2 = 7
y2 = -(b + w[0] * x2) / w[1]
line.set_data([x1, x2], [y1, y2])
x1 = 0
y1 = -(b + w[0] * x1) / w[1]
label.set_text(history[i])
label.set_position([x1, y1])
return line, label
# call the animator. blit=true means only re-draw the parts that have changed.
print history
anim = animation.FuncAnimation(fig, animate, init_func=init, frames=len(history), interval=1000, repeat=True,
blit=True)
plt.show()
anim.save('perceptron.gif', fps=2, writer='imagemagick')

可視化

可見超平面被誤分類點所吸引，朝着它移動，使得兩者距離逐步減小，直到正確分類爲止。通過這個動畫，是不是對感知機的梯度下降算法有了更直觀的感悟呢？

算法的收斂性

記輸入向量加進常數1的拓充形式，其最大長度爲，記感知機的參數向量，設滿足條件的超平面可以將數據集完全正確地分類，定義最小值伽馬：

則誤分類次數k滿足：

證明請參考《統計學習方法》P31。

感知機學習算法的對偶形式

對偶指的是，將w和b表示爲測試數據i的線性組合形式，通過求解係數得到w和b。具體說來，如果對誤分類點i逐步修改wb修改了n次，則w，b關於i的增量分別爲，這裏，則最終求解到的參數分別表示爲：

於是有算法2.2：

感知機對偶算法代碼

涉及到比較多的矩陣計算，於是用NumPy比較多：

# -*- coding:utf-8 -*-
# Filename: train2.2.py
# Author：hankcs
# Date: 2015/1/31 15:15
import numpy as np
from matplotlib import pyplot as plt
from matplotlib import animation
# An example in that book, the training set and parameters' sizes are fixed
training_set = np.array([[[3, 3], 1], [[4, 3], 1], [[1, 1], -1]])
a = np.zeros(len(training_set), np.float)
b = 0.0
Gram = None
y = np.array(training_set[:, 1])
x = np.empty((len(training_set), 2), np.float)
for i in range(len(training_set)):
x[i] = training_set[i][0]
history = []
def cal_gram():
"""
calculate the Gram matrix
:return:
"""
g = np.empty((len(training_set), len(training_set)), np.int)
for i in range(len(training_set)):
for j in range(len(training_set)):
g[i][j] = np.dot(training_set[i][0], training_set[j][0])
return g
def update(i):
"""
update parameters using stochastic gradient descent
:param i:
:return:
"""
global a, b
a[i] += 1
b = b + y[i]
history.append([np.dot(a * y, x), b])
# print a, b # you can uncomment this line to check the process of stochastic gradient descent
# calculate the judge condition
def cal(i):
global a, b, x, y
res = np.dot(a * y, Gram[i])
res = (res + b) * y[i]
return res
# check if the hyperplane can classify the examples correctly
def check():
global a, b, x, y
flag = False
for i in range(len(training_set)):
if cal(i) <= 0:
flag = True
update(i)
if not flag:
w = np.dot(a * y, x)
print "RESULT: w: " + str(w) + " b: " + str(b)
return False
return True
if __name__ == "__main__":
Gram = cal_gram() # initialize the Gram matrix
for i in range(1000):
if not check(): break
# draw an animation to show how it works, the data comes from history
# first set up the figure, the axis, and the plot element we want to animate
fig = plt.figure()
ax = plt.axes(xlim=(0, 2), ylim=(-2, 2))
line, = ax.plot([], [], 'g', lw=2)
label = ax.text([], [], '')
# initialization function: plot the background of each frame
def init():
line.set_data([], [])
x, y, x_, y_ = [], [], [], []
for p in training_set:
if p[1] > 0:
x.append(p[0][0])
y.append(p[0][1])
else:
x_.append(p[0][0])
y_.append(p[0][1])
plt.plot(x, y, 'bo', x_, y_, 'rx')
plt.axis([-6, 6, -6, 6])
plt.grid(True)
plt.xlabel('x')
plt.ylabel('y')
plt.title('Perceptron Algorithm 2 (www.hankcs.com)')
return line, label
# animation function. this is called sequentially
def animate(i):
global history, ax, line, label
w = history[i][0]
b = history[i][1]
if w[1] == 0: return line, label
x1 = -7.0
y1 = -(b + w[0] * x1) / w[1]
x2 = 7.0
y2 = -(b + w[0] * x2) / w[1]
line.set_data([x1, x2], [y1, y2])
x1 = 0.0
y1 = -(b + w[0] * x1) / w[1]
label.set_text(str(history[i][0]) + ' ' + str(b))
label.set_position([x1, y1])
return line, label
# call the animator. blit=true means only re-draw the parts that have changed.
anim = animation.FuncAnimation(fig, animate, init_func=init, frames=len(history), interval=1000, repeat=True,
blit=True)
plt.show()
# anim.save('perceptron2.gif', fps=2, writer='imagemagick')

可視化

與算法1的結果相同，我們也可以將數據集改一下：

training_set = np.array([[[3, 3], 1], [[4, 3], 1], [[1, 1], -1], [[5, 2], -1]])

會得到一個複雜一些的結果：

讀後感

通過最簡單的模型，學習到ML中的常用概念和常見流程。

另外本文只是個人筆記，服務於個人備忘用，對質量和後續不做保證。還是那句話，博客只做補充，要入門，還是得看經典著作。

轉載：http://www.hankcs.com/ml/the-perceptron.html

統計學習方法之感知機

概念

定義

感知機學習策略

定義

感知機學習算法

原始形式

感知機算法代碼

可視化

算法的收斂性

感知機學習算法的對偶形式

感知機對偶算法代碼

可視化

讀後感

這個網絡爬蟲代碼，拿到數據之後如何存到csv文件中去？

.NET開源強大、易於使用的緩存框架 - FusionCache

面試，有時候是個運氣活

leetcode_4_median-of-two-sorted-arrays

LeetCode_2_add-two-numbers

linux glibc升級

leetcode_1-two-sum

統計學習方法之感知機

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結