DBScan 是一種基於密度的聚類算法,主要算法流程如下圖:
DBSCAN(D, eps, MinPts)
C = 0 //類別標示
for each unvisited point P in dataset D //遍歷
mark P as visited //已經訪問
NeighborPts = regionQuery(P, eps) //計算這個點的鄰域
if sizeof(NeighborPts) < MinPts //不能作爲核心點
mark P as NOISE //標記爲噪音數據
else //作爲核心點,根據該點創建一個類別
C = next cluster
expandCluster(P, NeighborPts, C, eps, MinPts) //根據該核心店擴展類別
expandCluster(P, NeighborPts, C, eps, MinPts)
add P to cluster C //擴展類別,核心店先加入
for each point P' in NeighborPts //然後針對核心店鄰域內的點,如果該點沒有被訪問,
if P' is not visited
mark P' as visited //進行訪問
NeighborPts' = regionQuery(P', eps) //如果該點爲核心點,則擴充該類別
if sizeof(NeighborPts') >= MinPts
NeighborPts = NeighborPts joined with NeighborPts'
if P' is not yet member of any cluster //如果鄰域內點不是核心點,並且無類別,比如噪音數據,則加入此類別
add P' to cluster C
regionQuery(P, eps) //計算鄰域
return all points within P's eps-neighborhood
結合百度百科的僞代碼: