爲什麼機器學習(六) —— 數據降維LDA線性判別分析原理

因此，LDA降維的套路是：
（1）求各個類的均值向量和總的均值向量
（2）求類間散佈矩陣 $S_B$ 和類內散佈矩陣 $S_w$
（3）計算矩陣乘法 $S = S_w^{-1}S_B$
（4）對S進行特徵值分解，得到特徵值和特徵向量
（5）若想降到k維，則按特徵值從大到小排序，把前k個特徵向量作爲行構建投影矩陣 $W，x_{new} = x * W$

以下是利用LDA降維處理Iris數據集的代碼：

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# 讀取數據並取數字部分
data = pd.read_csv('iris.csv')
# print(data)

# 求各類的均值向量和總均值向量
type = data['Species'].value_counts()
meanVal = np.empty([3,4])
for i in range(len(type)):
    meanVal[i] = np.mean(np.mat(data[data['Species'] == type.index[i]].iloc[:, 1:5]),axis = 0)
#print(meanVal)

meanValAll = np.mean(meanVal,axis = 0)
#print(meanValAll)

# 求類內和類間散佈矩陣
S_w = np.zeros([4,4])
S_b = np.zeros([4,4])
for i in range(len(type)):
    x = np.mat(data[data['Species'] == type.index[i]].iloc[:, 1:5])
    S_w += np.matmul((x - meanVal[i]).T, x - meanVal[i])
    n = len(x)
    m_mat = np.mat(meanVal[i] - meanValAll)
    S_b += n*np.matmul(m_mat.T,m_mat)
print(S_w)
print(S_b)

# 求S_w^-1 * S_B
S = np.linalg.inv(S_w)*S_b
#求特徵值，特徵向量
eigVals,eigVects = np.linalg.eig(S)

print(eigVals,"\n",eigVects)

# 4->2投影矩陣
W = eigVects[0:2]

# 繪圖
fig = plt.figure()
ax1 = fig.add_subplot()
plt.xlabel('LDA1')
plt.ylabel('LDA2')
colors = ['r','g','b']
for i in range(len(type)):
    x = np.mat(data[data['Species'] == type.index[i]].iloc[:, 1:5])
    x_new = (x * W.T).getA()
    lda1 = list(x_new[:,0])
    lda2 = list(x_new[:,1])
    ax1.scatter(lda1,lda2,c=colors[i],label=type.index[i])

plt.show()

結果：

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

爲什麼機器學習(六) —— 數據降維LDA線性判別分析原理

維特比算法的簡單理解——以分詞算法爲例

爲什麼機器學習（一）——Hessian矩陣的正定性爲什麼可以決定函數是否有極值

遊離態GLZ的LeetCode刷題筆記3

遊離態GLZ的數字貨幣專欄（一）關於穩定幣

金融量化python應用基礎篇（1）--numpy的使用

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結