python(sklearn) 聚類性能度量

python(sklearn) 聚類性能度量

一、sklearn聚類評價函數:

metrics.adjusted_mutual_info_score([,])	
metrics.adjusted_rand_score(labels_true,)	
metrics.calinski_harabasz_score(X, labels)	
metrics.davies_bouldin_score(X, labels)
metrics.completeness_score(labels_true,)	
metrics.cluster.contingency_matrix([,])	
metrics.fowlkes_mallows_score(labels_true,)	
metrics.homogeneity_completeness_v_measure()	
metrics.homogeneity_score(labels_true,)	
metrics.mutual_info_score(labels_true,)	
metrics.normalized_mutual_info_score([,])	
metrics.silhouette_score(X, labels[,])	
metrics.silhouette_samples(X, labels[, metric])
metrics.v_measure_score(labels_true, labels_pred)	

二、評價函數說明:

1. 輪廓係數(Silhouette Coefficient)

  1. 函數:
    def silhouette_score(X, labels, metric=‘euclidean’, sample_size=None,
    random_state=None, **kwds):

  2. 函數值說明:
    所有樣本的s i 的均值稱爲聚類結果的輪廓係數,定義爲S,是該聚類是否合理、有效的度量。聚類結果的輪廓係數的取值在【-1,1】之間,值越大,說明同類樣本相距約近,不同樣本相距越遠,則聚類效果越好。


2. CH分數(Calinski Harabasz Score )

  1. 函數:
    def calinski_harabasz_score(X, labels):
  2. 函數值說明:
    類別內部數據的協方差越小越好,類別之間的協方差越大越好,這樣的Calinski-Harabasz分數會高。 總結起來一句話:CH index的數值越大越好。

3. 戴維森堡丁指數(DBI)——davies_bouldin_score

  1. 函數:
    def davies_bouldin_score(X, labels):
  2. 函數值說明:
    注意:DBI的值最小是0,值越小,代表聚類效果越好。

完整示例:

#!/usr/bin/env python
# encoding: utf-8
'''
@Author  : pentiumCM
@Email   : [email protected]
@Software: PyCharm
@File    : iris_hierarchical_cluster.py
@Time    : 2020/4/15 23:55
@desc	 : 鳶尾花層次聚類
'''

from sklearn import datasets
import numpy as np
import matplotlib.pyplot as plt
from sklearn import preprocessing
from sklearn.cluster import AgglomerativeClustering
from scipy.cluster.hierarchy import linkage, dendrogram
from sklearn.decomposition import PCA
from mpl_toolkits.mplot3d import Axes3D

from sklearn import metrics

# 定義常量
cluster_num = 3

# 1. 導入數據集
iris = datasets.load_iris()
iris_data = iris.data

# 2. 數據預處理
data = np.array(iris_data)
std_scaler = preprocessing.StandardScaler()
data_M = std_scaler.fit_transform(data)

# 3. 繪製樹狀圖
plt.figure()
Z = linkage(data_M, method='ward', metric='euclidean')
p = dendrogram(Z, 0)
plt.show()

# 4. 模型訓練
ac = AgglomerativeClustering(n_clusters=cluster_num, affinity='euclidean', linkage='ward')
ac.fit(data_M)

# 聚類
label_list = ac.fit_predict(data_M)
for i in range(len(label_list)):
    if i % 50 == 0:
        print()
    else:
        print(label_list[i], end=" ")

print()

# 平面聚類的每一簇的元素
reslist = [[] for i in range(cluster_num)]
# 遍歷聚類中每個簇的元素
for i in range(len(label_list)):
    label = label_list[i]
    # 遍歷每一類
    reslist[label].append(data_M[i, :])

data_M = np.array(data_M.reshape((-1, 4)))
# 聚類結果可視化
pca = PCA(n_components=3)
pca.fit(data_M)
pca_data = pca.transform(data_M)

# 定義三維座標軸
fig = plt.figure()
ax1 = plt.axes(projection='3d')

# 繪製散點圖
zd = pca_data[:, 0]
xd = pca_data[:, 1]
yd = pca_data[:, 2]

colors = []
for label in label_list:
    if label == 0:
        colors.append('r')
    elif label == 1:
        colors.append('y')
    elif label == 2:
        colors.append('g')
    elif label == 3:
        colors.append('violet')

for i in range(len(label_list), data_M.shape[0]):
    colors.append('black')

ax1.scatter3D(xd, yd, zd, cmap='Blues', c=colors)
plt.show()

# 檢驗聚類的性能
# metrics.silhouette_score(X, labels[, …])
cluster_score_si = metrics.silhouette_score(data_M, label_list)

print("cluster_score_si", cluster_score_si)

cluster_score_ch = metrics.calinski_harabasz_score(data_M, label_list)
print("cluster_score_ch:", cluster_score_ch)

# DBI的值最小是0,值越小,代表聚類效果越好。
cluster_score_DBI = metrics.davies_bouldin_score(data_M, label_list)
print("cluster_score_DBI :", cluster_score_DBI)

運行結果:

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 
0 0 2 0 2 0 2 0 2 2 0 2 0 2 0 2 2 2 2 0 0 0 0 0 0 0 0 0 2 2 2 2 0 2 0 0 2 2 2 2 0 2 2 2 2 2 0 2 2 
0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
cluster_score_si 0.4466890410285909
cluster_score_ch: 222.71916382215363
cluster_score_DBI : 0.8034665302876753

參考資料

https://blog.csdn.net/qq_27825451/article/details/94436488

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章