算法描述
k近鄰算法(k-nearest neighbour)的輸入是實例的特徵向量,對應於特徵空間的點;輸出是實例的類別。k近鄰法假定在給定的訓練數據集裏,其中的實例的類別是確定的。對於新的實例,根據其k個最近的實例的類別,通過表決的方法進行預測。
3.1 k近鄰算法
算法3.1
輸入:訓練數據集和實例的特徵向量;
其中訓練數據集
其中,爲實例的特徵向量,爲實例的類別,,,是特徵向量的第i個參數,M是參數的個數;
輸出:實例所屬的類
(1)根據給定的距離度量,在訓練集裏找出與最鄰近的k個點,涵蓋這k個點的的鄰域記做。
(2)在中根據分類決策規則,決定的分類。
KaTeX parse error: No such environment: equation at position 8: \begin{̲e̲q̲u̲a̲t̲i̲o̲n̲}̲ y=\mathop{…
3.2 k近鄰模型
3.2.1 模型
3.2.2 距離
特徵空間中的距離是2個實例的相似程度的反映。k近鄰模型的特徵空間一般是n維實數向量空間。距離一般使用歐氏距離,或者使用距離或明可夫斯基距離。
設特徵空間X是n維實數向量空間,,,,和的距離定義爲
當p=2,稱爲歐氏距離
當p=1,稱爲曼哈頓距離
當p,她是各個座標差的最大值
3.2.3 k值得選擇
通常使用交叉驗證法選擇一個最優的k值。
3.2.4 分類決策規則
表述很數學,我……
3.3 k近鄰法的實現:kd樹
3.3.1 構造kd樹
算法3.2 構造平衡kd樹
輸入:k維空間數據集,其中。
輸出:一個kd樹
(1)開始構造根節點,根節點對應於包含的k維空間的超矩形區域。
選擇,以中所有實例的座標的中位數爲切分點,將這個超矩形區域切分成兩個子區域。
由根節點生成深度爲1的左右兩個子節點,左子節點對應區域內所有點的座標小於切分點的座標,右子節點對應區域內所有點的座標大於/等於切分點的座標。
將落在切分超平面上的實例點保存在根節點。
(2)重複(1)直到兩個子區域中沒有實例存在時停止。
3.3.2 搜索kd樹
算法3.3 用kd樹的最近鄰搜索
輸入:已構造的kd樹;目標點x;
輸出:x的最近鄰。
(1)在kd樹中找出包含目標點的葉節點(區域):從根節點出發遞歸地訪問他的子節點。若目標點x座標小於切分點的座標,則移動到左子節點,否則移動到右子節點。直到葉子節點。
(2)以此節點作爲當前最近點。
(3)遞歸地向上回退,對每個節點進行:
(a)如果該節點保存的實例點比當前最近點距離目標點更近,則以該實例點爲當前最近點。
(b)當前最近點一定存在於該節點的一個子節點對應的區域。檢查該子節點的父節點的另一個子節點對應的區域是否有更近的點。具體的,檢查另一個子節點對應的區域是否與以目標節點爲球心、以目標點與“當前最近點”間的距離爲半徑的超球體相交。
如果相交,可能在另一個子節點對應的區域內存在距目標點更近的點,移動到另一個子節點。接着,遞歸地進行最近鄰搜索;
如果不相交,向上回退。
(4)當回退到根節點時,搜索結束。最後的“當前最近點”即爲x的最近鄰點。
代碼
以下代碼在Python3中調試通過。
(1)生成KD樹
先上圖。還是挺有意思的。
輸入數據是一個2維向量集,也可支持多維,代碼做了適配。
import numpy as np
import matplotlib.pyplot as plt
import copy
import math
"""
X, feature vectors
Y, class of X
D, dimension of each of vectors.
"""
# Construct initial to be classified data
D = 2
NUM = 50
C = [ 'g', 'r', 'b' ]
#X = np.array([ (3,5), (2,4), (1,1), (5,2), (1,5), (4,1) ])
X = np.random.rand(NUM,D)
Y = [ C[i] for i in np.random.randint(0,len(C),NUM) ]
class KD_Node:
cur_trav = None # cursor for traversal.
x_min = 0
x_max = 1
y_min = 0
y_max = 1
def __init__( self,
point=None, split=None, color=None,
L=None, R=None, father=None,
scope={} ):
"""
initiate a kd tree.
point: datum of this node
split: split plane for this node
L: left son
R: right son
father: father of this node, if root it's None
scope: area in hyperspace for each node.
"""
self.point = point
self.split = split
self.color = color
self.left = L
self.right = R
self.father = father
self.flag_trav = 0 # traversal flag.
# bit 0 is notation for itself
# bit 1 is for its left son
# bit 2 is for its right son
self.scope = scope # paint scope:
# x0: min of x
# x1: max of x
# y0: min of y
# y1: max of y
def clear_trav(self):
KD_Node.cur_trav = None
self.flag_trav = 0
if self.left:
self.left.clear_trav()
if self.right:
self.right.clear_trav()
def __iter__(self):
return self
def __next__(self):
# with non-iteration traverse the tree
cursor = None
if KD_Node.cur_trav == None: # First time to use cur_trav, initiate.
KD_Node.cur_trav = self
cursor = KD_Node.cur_trav
while 1:
if cursor.flag_trav & 0X07 == 0X7: # any node has flag with
# value=3
# that states a completion
# of traversal.
if cursor.father == None:
raise StopIteration
else:
cursor = cursor.father
elif cursor.flag_trav & 0X01 == 0: # if bit0 == 0,
cursor.flag_trav |= 0X01 # set bit0 = 1
#cursor = cursor # not need. set cursor => self
break # BREAK! return current.
elif cursor.flag_trav & 0X02 == 0: # if bit1==0, bit2==0
cursor.flag_trav |= 0X02 # set bit1 of self
if cursor.left != None:
cursor = cursor.left # set cursor => left son
else: # self.left is None, skip
continue
elif cursor.flag_trav & 0X04 == 0: # if bit2 == 0,
cursor.flag_trav |= 0X04 # set bit2 = 1
if cursor.right != None:
cursor = cursor.right # set cursor => right son
else:
continue
KD_Node.cur_trav = cursor
return KD_Node.cur_trav
def CreateKDT(node=None, data=None, color=None, father=None ):
"""
TODO: DOC FOR CreateKDT
INPUT: node, the node itself?
data, [ (3,5), (2,4), (1,1) ]
father, the father
OUTPUT:
"""
global C
if len(data) > 0:
global D
dim = D
var = np.var(data, axis=0) # variance for each dimension
split = np.argmax(var) # split for this node
pos = int(len(data)/2)
pos_list = np.argpartition(data[:,split], pos)
point = data[pos_list[pos]] # point for this node
color = C[np.random.randint(0, len(C))]
cur_scope = {} # scope
if not father:
cur_scope = { 'x0': 0, 'x1': 6, # current scope is where the node is.
'y0': 0, 'y1': 6 }# Or you can assign it the min and
# max of the graph.
else: # update cur_scope
cur_scope = copy.deepcopy(father.scope)
if father.split == 0:
if point[0] < father.point[0]:
cur_scope['x1'] = father.point[0]
else:
cur_scope['x0'] = father.point[0]
elif father.split == 1:
if point[1] < father.point[1]:
cur_scope['y1'] = father.point[1]
else:
cur_scope['y0'] = father.point[1]
node = KD_Node( point=point, split=split, color=color, father=father,
scope=cur_scope )
if len(data[pos_list[:pos]]) != 0:
node.left = CreateKDT( node = node.left,
data = data[pos_list[:pos]],
color = color,
father = node )
if len(data[pos_list[(pos+1):]]) != 0:
node.right = CreateKDT( node = node.right,
data = data[pos_list[(pos+1):]],
color = color,
father = node )
return node
def get_split_pos(data, split):
"""return the position to split in data."""
pos = len(data)/2
return
def preorder(node, depth=-1):
"""
Preorder a KD node
"""
print(node)
if node:
if node.left:
preorder(node.left)
if node.right:
preorder(node.right)
def draw_KDT(kd):
"""
Draw a plot in which each of data determined by a point and draw the classifying plane.
"""
x_min = kd.x_min
x_max = kd.x_max
y_min = kd.y_min
y_max = kd.y_max
plt.figure(figsize=(6,6))
plt.xlabel("$x^{(1)}$")
plt.ylabel("$x^{(2)}$")
plt.title("Machine Learning: KD Tree")
plt.xlim(int(x_min),math.ceil(x_max))
plt.ylim(int(y_min),math.ceil(y_max))
ax = plt.gca()
ax.set_aspect(1)
plt.plot( [x_min, x_max, x_max, x_min, x_min],
[y_min, y_min, y_max, y_max, y_min] )
line_from = [] # split line from and to
line_to = []
for node in kd:
if node.split == 0:
line_from = [ node.point[0], node.scope['y0'] ]
line_to = [ node.point[0], node.scope['y1'] ]
if node.split == 1:
line_from = [ node.scope['x0'], node.point[1] ]
line_to = [ node.scope['x1'], node.point[1] ]
plt.plot( [ line_from[0], line_to[0] ],
[ line_from[1], line_to[1] ],
'k-', linewidth=1 )
plt.scatter( node.point[0], node.point[1], color=node.color )
plt.show()
pass
def find_knn(root, x):
pass
def main():
kd = None
kd = CreateKDT(kd, X)
#kd.clear_trav()
draw_KDT(kd)
if __name__ == "__main__":
main()
參考:
[1] http://blog.csdn.net/u010551621/article/details/44813299