Python 聚類算法在矢量量化案例詳解

關注微信公共號：小程在線

關注CSDN博客：程志偉的博客

KMeans算法將一組N個樣本的特徵矩陣X劃分爲K個無交集的簇，直觀上來看是簇是一組一組聚集在一起的數據，在一個簇中的數據就認爲是同一類。簇就是聚類的結果表現。

簇中所有數據的均值通常被稱爲這個簇的“質心”（centroids）。在一個二維平面中，一簇數據點的質心的橫座標就是這一簇數據點的橫座標的均值，質心的縱座標就是這一簇數據點的縱座標的均值。同理可推廣至高維空間。

順序過程
1 隨機抽取K個樣本作爲最初的質心
2 開始循環：
2.1 將每個樣本點分配到離他們最近的質心，生成K個簇
2.2 對於每個簇，計算所有被分到該簇的樣本點的平均值作爲新的質心
3 當質心的位置不再發生變化，迭代停止，聚類完成

簇內誤差平方和：
對於一個簇來說，所有樣本點到質心的距離之和越小，我們就認爲這個簇中的樣本越相似，簇內差異就越小

from sklearn.datasets import make_blobs
import matplotlib.pyplot as plt

#自己創建數據集
X, y = make_blobs(n_samples=500,n_features=2,centers=4,random_state=1)

fig, ax1 = plt.subplots(1)
ax1.scatter(X[:, 0], X[:, 1]
,marker='o' #點的形狀
,s=8 #點的大小
)
plt.show()

全部一個顏色，查看這些點的分佈
color = ["red","pink","orange","gray"]
fig, ax1 = plt.subplots(1)
for i in range(4):
ax1.scatter(X[y==i, 0], X[y==i, 1]
,marker='o' #點的形狀
,s=8 #點的大小
,c=color[i]
)

plt.show()

基於這個分佈，我們來使用Kmeans進行聚類，首選選取3類
from sklearn.cluster import KMeans
n_clusters = 3
cluster = KMeans(n_clusters=n_clusters, random_state=0).fit(X)

#查看分類情況

y_pred = cluster.labels_
y_pred
Out[4]:
array([0, 0, 2, 1, 2, 1, 2, 2, 2, 2, 0, 0, 2, 1, 2, 0, 2, 0, 1, 2, 2, 2,
2, 1, 2, 2, 1, 1, 2, 2, 0, 1, 2, 0, 2, 0, 2, 2, 0, 2, 2, 2, 1, 2,
......
1, 2, 2, 2, 1, 1, 1, 2, 2, 2, 0, 2, 1, 2, 0, 1, 0, 1, 0, 2, 1, 1,
0, 2, 2, 0, 2, 2, 2, 0, 2, 1, 2, 2, 0, 0, 0, 2])

#Kmeans也有predict和fit_predict函數，表示學習數據X並對X的類進行預測，和labels一模一樣

pre = cluster.fit_predict(X)
pre == y_pred
Out[5]:
array([ True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
......
True, True, True, True, True, True, True, True, True,
True, True, True, True, True, True, True, True, True,
True, True, True, True, True])

#當數據量太大時，不必使用所有的數據尋找質心，調用部分數據就可以，使用predict來調用，提高效率

#因爲數據量太小的原因，結果不會太準確，但是當數據量增大時，可以提升準確度

cluster_smallsub = KMeans(n_clusters=n_clusters, random_state=0).fit(X[:200])
y_pred_ = cluster_smallsub.predict(X)
y_pred == y_pred_
Out[6]:
array([False, False, True, False, True, False, True, True, True,
True, False, False, True, False, True, False, True, False,
False, True, True, True, True, False, True, True, False,
......
True, False, True, True, True, False, True, False, True,
True, False, False, False, True])

#確定質心的位置

centroid = cluster.cluster_centers_
centroid
Out[7]:
array([[-7.09306648, -8.10994454],
[-1.54234022, 4.43517599],
[-8.0862351 , -3.5179868 ]])

centroid.shape
Out[8]: (3, 2)

#重要屬性inertia_，查看總距離平方和

inertia = cluster.inertia_
inertia
Out[9]: 1903.4503741659223

#對聚類結果進行畫圖

color = ["red","pink","orange","gray"]
fig, ax1 = plt.subplots(1)
for i in range(n_clusters):
ax1.scatter(X[y_pred==i, 0], X[y_pred==i, 1]
,marker='o'
,s=8
,c=color[i]
)

ax1.scatter(centroid[:,0],centroid[:,1]
,marker="x"
,s=15
,c="black")
plt.show()

#把簇的類似改成4類，查看總距離平方和

n_clusters = 4
cluster_ = KMeans(n_clusters=n_clusters, random_state=0).fit(X)
inertia_ = cluster_.inertia_
inertia_
Out[11]: 908.3855684760613

#把簇的類似改成5類，查看總距離平方和

n_clusters = 5
cluster_ = KMeans(n_clusters=n_clusters, random_state=0).fit(X)
inertia_ = cluster_.inertia_
inertia_
Out[12]: 811.0841324482415

#把簇的類似改成6類，查看總距離平方和

n_clusters = 6
cluster_ = KMeans(n_clusters=n_clusters, random_state=0).fit(X)
inertia_ = cluster_.inertia_
inertia_
Out[13]: 733.153835008308

可以看出簇的數越大，總距離平方和越少。極限時，簇的個數爲500，總距離平方和爲0，不能對簇做出最優選擇。

#聚類算法的模型評估指標
#當真實標籤未知的時候：輪廓係數

在sklearn中，我們使用模塊metrics中的類silhouette_score來計算輪廓係數，它返回的是一個數據集中，所有樣
本的輪廓係數的均值。但我們還有同在metrics模塊中的silhouette_sample，它的參數與輪廓係數一致，但返回的
是數據集中每個樣本自己的輪廓係數

from sklearn.metrics import silhouette_score
from sklearn.metrics import silhouette_samples

X.shape
Out[15]: (500, 2)

#簇等於6時的輪廓係數

silhouette_score(X,y_pred)
Out[16]: 0.5882004012129721

#簇等於4時的輪廓係數

n_clusters = 4
cluster_ = KMeans(n_clusters=n_clusters, random_state=0).fit(X)
inertia_ = cluster_.inertia_
inertia_
silhouette_score(X,cluster_.labels_)
Out[17]: 0.6505186632729437

#簇等於5時的輪廓係數

n_clusters = 5
cluster_ = KMeans(n_clusters=n_clusters, random_state=0).fit(X)
inertia_ = cluster_.inertia_
inertia_
silhouette_score(X,cluster_.labels_)
Out[18]: 0.5746932321727457

#數據集中每個樣本自己的輪廓係數的個數與平均值

silhouette_samples(X,y_pred).shape
Out[19]: (500,)

silhouette_samples(X,y_pred).mean()
Out[20]: 0.5882004012129721

#當真實標籤未知的時候：Calinski-Harabaz Index

from sklearn.metrics import calinski_harabaz_score
calinski_harabaz_score(X, y_pred)
H:\Anaconda3\lib\site-packages\sklearn\utils\deprecation.py:85: DeprecationWarning: Function calinski_harabaz_score is deprecated; Function 'calinski_harabaz_score' has been renamed to 'calinski_harabasz_score' and will be removed in version 0.23.
warnings.warn(msg, category=DeprecationWarning)
Out[21]: 1809.991966958033

#卡林斯基-哈拉巴斯指數越高越好，比起輪廓係數，它有一個巨大的優點，就是計算非常快速

t0 = time()
calinski_harabaz_score(X, y_pred)
time() - t0
H:\Anaconda3\lib\site-packages\sklearn\utils\deprecation.py:85: DeprecationWarning: Function calinski_harabaz_score is deprecated; Function 'calinski_harabaz_score' has been renamed to 'calinski_harabasz_score' and will be removed in version 0.23.
warnings.warn(msg, category=DeprecationWarning)
Out[23]: 0.0009794235229492188

t0 = time()
silhouette_score(X,y_pred)
time() - t0
Out[24]: 0.009987354278564453

import datetime
datetime.datetime.fromtimestamp(t0).strftime("%Y-%m-%d %H:%M:%S")
Out[25]: '2020-03-30 22:01:27'

案例：基於輪廓係數來選擇n_clusters

n_clusters = 4

#創建一個畫布，2個圖，
fig, (ax1, ax2) = plt.subplots(1, 2)
fig.set_size_inches(18, 7)

#第一個圖是輪廓係數圖像，由各個簇的輪廓係數組成的條形圖，橫座標是輪廓係數，縱座標是每個樣本
#輪廓係數在[-1,1]之間，但是要大於0，效果好
#設置橫座標
ax1.set_xlim([-0.1, 1])

#縱座標從0開始，最大值是x.shape(0)的取值
#每個簇排在一起，但是有間隙
ax1.set_ylim([0, X.shape[0] + (n_clusters + 1) * 10])

#開始建模
clusterer = KMeans(n_clusters=n_clusters, random_state=10).fit(X)
cluster_labels = clusterer.labels_

#silhouette_score返回得是所有樣本點的均值
#矩陣X和聚類完畢之後的標籤
silhouette_avg = silhouette_score(X, cluster_labels)
print("For n_clusters =", n_clusters,
"The average silhouette_score is :", silhouette_avg)

#silhouette_samples返回每個樣本的輪廓係數
sample_silhouette_values = silhouette_samples(X, cluster_labels)

#Y軸的初始值
y_lower = 10

#對每個簇進行循環
for i in range(n_clusters):
#從每個樣本的輪廓係數結果中抽取第i個簇的輪廓係數，並進行排序
ith_cluster_silhouette_values = sample_silhouette_values[cluster_labels == i]
#注意，.sort會直接改掉原數據的順序
ith_cluster_silhouette_values.sort()
#查看簇中究竟有多少個樣本
size_cluster_i = ith_cluster_silhouette_values.shape[0]
#一個簇在y州的取值是由初始值（y_lower）開始，到初始值加上這個簇中的樣本數量結束（y_upper）
y_upper = y_lower + size_cluster_i

#用i的浮點數除以n_clusters,在不同的i下生成不同的小數，以確保所有的簇都有不同的顏色
color = cm.nipy_spectral(float(i)/n_clusters)
#fill_between是讓一個範圍的柱狀圖都統一顏色的函數，
#fill_betweenx的範圍是在縱座標上，參數輸入（縱座標的下限，縱座標的上限，X軸上的取值，柱狀圖的顏色）
ax1.fill_betweenx(np.arange(y_lower, y_upper)
,ith_cluster_silhouette_values
,facecolor=color
,alpha=0.7)
#爲每個簇的輪廓係數寫上編號，並讓簇的編號顯示在座標軸每個條形圖的中間位置
#text參數（要顯示編號位置的橫座標，縱座標，編號內容）
ax1.text(-0.05, y_lower + 0.5 * size_cluster_i, str(i))
#爲下一個簇計算新的y軸上的初始值，每一次迭代後y再加上10以保證不同簇的圖像之間顯示有空隙
y_lower = y_upper + 10

ax1.set_title("The silhouette plot for the various clusters.")
ax1.set_xlabel("The silhouette coefficient values")
ax1.set_ylabel("Cluster label")

# 把整個數據集上的輪廓係數的均值以虛線形式放入圖中
ax1.axvline(x=silhouette_avg, color="red", linestyle="--")

#讓y軸不顯示任何刻度
ax1.set_yticks([])

#讓X軸上的刻度顯示爲規定的列表
ax1.set_xticks([-0.1, 0, 0.2, 0.4, 0.6, 0.8, 1])

#開始處理第二個圖，首先獲取新的顏色，由於沒有循環需要一次性生成多個小數來獲取多個顏色
#cluster_labels.astype(float) 生成浮點數，nipy_spectral只能用浮點數，500個值只有4個顏色
colors = cm.nipy_spectral(cluster_labels.astype(float) / n_clusters)
ax2.scatter(X[:, 0], X[:, 1],marker='o',s=8,c=colors)

#把生成的質心放在圖像中
centers = clusterer.cluster_centers_

ax2.scatter(centers[:, 0], centers[:, 1], marker='x',c="red", alpha=1, s=200)
ax2.set_title("The visualization of the clustered data.")
ax2.set_xlabel("Feature space for the 1st feature")
ax2.set_ylabel("Feature space for the 2nd feature")

#爲整個圖設置標題
plt.suptitle(("Silhouette analysis for KMeans clustering on sample data "
"with n_clusters = %d" % n_clusters),
fontsize=14, fontweight='bold')

plt.show()

#將上述過程包裝稱循環
for n_clusters in [2,3,4,5,6,7]:
n_clusters = n_clusters
fig, (ax1, ax2) = plt.subplots(1, 2)
fig.set_size_inches(18, 7)
ax1.set_xlim([-0.1, 1])
ax1.set_ylim([0, X.shape[0] + (n_clusters + 1) * 10])
clusterer = KMeans(n_clusters=n_clusters, random_state=10).fit(X)
cluster_labels = clusterer.labels_
silhouette_avg = silhouette_score(X, cluster_labels)
print("For n_clusters =", n_clusters,"The average silhouette_score is :", silhouette_avg)
sample_silhouette_values = silhouette_samples(X, cluster_labels)
y_lower = 10
for i in range(n_clusters):
ith_cluster_silhouette_values = sample_silhouette_values[cluster_labels == i]
ith_cluster_silhouette_values.sort()
size_cluster_i = ith_cluster_silhouette_values.shape[0]
y_upper = y_lower + size_cluster_i
color = cm.nipy_spectral(float(i)/n_clusters)
ax1.fill_betweenx(np.arange(y_lower, y_upper)
,ith_cluster_silhouette_values
,facecolor=color
,alpha=0.7)
ax1.text(-0.05, y_lower + 0.5 * size_cluster_i, str(i))
y_lower = y_upper + 10
ax1.set_title("The silhouette plot for the various clusters.")
ax1.set_xlabel("The silhouette coefficient values")
ax1.set_ylabel("Cluster label")
ax1.axvline(x=silhouette_avg, color="red", linestyle="--")
ax1.set_yticks([])
ax1.set_xticks([-0.1, 0, 0.2, 0.4, 0.6, 0.8, 1])

colors = cm.nipy_spectral(cluster_labels.astype(float) / n_clusters)
ax2.scatter(X[:, 0], X[:, 1],marker='o',s=8,c=colors)
centers = clusterer.cluster_centers_
# Draw white circles at cluster centers
ax2.scatter(centers[:, 0], centers[:, 1], marker='x',c="red", alpha=1, s=200)
ax2.set_title("The visualization of the clustered data.")
ax2.set_xlabel("Feature space for the 1st feature")
ax2.set_ylabel("Feature space for the 2nd feature")
plt.suptitle(("Silhouette analysis for KMeans clustering"
"with n_clusters = %d" % n_clusters),fontsize=14, fontweight='bold')
plt.show()

For n_clusters = 2 The average silhouette_score is : 0.7049787496083262

For n_clusters = 3 The average silhouette_score is : 0.5882004012129721

For n_clusters = 4 The average silhouette_score is : 0.6505186632729437

For n_clusters = 5 The average silhouette_score is : 0.56376469026194

For n_clusters = 6 The average silhouette_score is : 0.4504666294372765

For n_clusters = 7 The average silhouette_score is : 0.39092211029930857

當簇等於2時，可以看出有一個簇的明顯高於平均值，當實施精準營銷時，可以只考慮該類人員的特徵。

當簇等於4時，也可以把數據分成很好的類。

重要參數init & random_state & n_init

init 默認輸入"kmeans++"：一種爲K均值聚類選擇初始聚類中心的聰明的辦法，以加速收斂。如果輸入了n維數組，數組的形狀應
該是(n_clusters，n_features)並給出初始質心。
plus = KMeans(n_clusters = 10).fit(X)
plus.n_iter_
Out[28]: 17

random = KMeans(n_clusters = 10,init="random",random_state=420).fit(X)
random.n_iter_
Out[29]: 19

重要參數max_iter & tol：讓迭代停下來

max_iter：整數，默認300，單次運行的k-means算法的最大迭代次數
tol：浮點數，默認1e-4，兩次迭代間Inertia下降的量，如果兩次迭代之間Inertia下降的值小於tol所設定的值，迭代就會停下

random = KMeans(n_clusters = 10,init="random",max_iter=10,random_state=420).fit(X)
y_pred_max10 = random.labels_
silhouette_score(X,y_pred_max10)
Out[30]: 0.3952586444034157

random = KMeans(n_clusters = 10,init="random",max_iter=20,random_state=420).fit(X)
y_pred_max20 = random.labels_
silhouette_score(X,y_pred_max20)
Out[31]: 0.3401504537571701

#聚類算法用於降維，KMeans的矢量量化應用

矢量量化的降維是在同等樣本量上壓縮信息的大小，即不改變特徵的數目也不改變樣本的數目，只改變在這些特徵下的樣本上的信息量。

import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.metrics import pairwise_distances_argmin

#對兩個序列中的點進行距離匹配的函數
from sklearn.datasets import load_sample_image

#導入圖片數據所用的類
from sklearn.utils import shuffle

china = load_sample_image("china.jpg")
china
Out[33]:
array([[[174, 201, 231],
[174, 201, 231],
[174, 201, 231],
...,
[250, 251, 255],
[250, 251, 255],
[250, 251, 255]],

[[172, 199, 229],
[173, 200, 230],
[173, 200, 230],
...,
[251, 252, 255],
[251, 252, 255],
[251, 252, 255]],

[[174, 201, 231],
[174, 201, 231],
[174, 201, 231],
...,
[252, 253, 255],
[252, 253, 255],
[252, 253, 255]],

...,

[[ 88, 80, 7],
[147, 138, 69],
[122, 116, 38],
...,
[ 39, 42, 33],
[ 8, 14, 2],
[ 6, 12, 0]],

[[122, 112, 41],
[129, 120, 53],
[118, 112, 36],
...,
[ 9, 12, 3],
[ 9, 15, 3],
[ 16, 24, 9]],

[[116, 103, 35],
[104, 93, 31],
[108, 102, 28],
...,
[ 43, 49, 39],
[ 13, 21, 6],
[ 15, 24, 7]]], dtype=uint8)

china.dtype
Out[34]: dtype('uint8')

china.shape
Out[35]: (427, 640, 3)

china[0][0]
Out[36]: array([174, 201, 231], dtype=uint8)

newimage = china.reshape((427 * 640,3))
newimage.shape
Out[37]: (273280, 3)

import pandas as pd
pd.DataFrame(newimage).drop_duplicates().shape
Out[38]: (96615, 3)

plt.figure(figsize=(15,15))
plt.imshow(china)
Out[39]: <matplotlib.image.AxesImage at 0x1d286ad0f28>

flower = load_sample_image("flower.jpg")
plt.figure(figsize=(15,15))
plt.imshow(flower)
Out[40]: <matplotlib.image.AxesImage at 0x1d286b43278>

3. 決定超參數，數據預處理
china = np.array(china, dtype=np.float64) / china.max()
w, h, d = original_shape = tuple(china.shape)
w, h, d
Out[42]: (427, 640, 3)

#設置d=3，不等於3報錯

assert d == 3

d_ = 5
assert d_ == 3, "一個格子中的特徵數目不等於3種"
Traceback (most recent call last):

File "<ipython-input-44-6bfdf4addbf4>", line 2, in <module>
assert d_ == 3, "一個格子中的特徵數目不等於3種"

AssertionError: 一個格子中的特徵數目不等於3種

#reshape改變數據結果,只要總數據量不變，維度都可以變化

a = np.random.random((2,4))
a
Out[45]:
array([[0.17545705, 0.61403347, 0.65398707, 0.32697789],
[0.84574926, 0.42294712, 0.33591104, 0.49572982]])

a.reshape((4,2))
Out[46]:
array([[0.17545705, 0.61403347],
[0.65398707, 0.32697789],
[0.84574926, 0.42294712],
[0.33591104, 0.49572982]])

a.reshape((4,2))
Out[47]:
array([[0.17545705, 0.61403347],
[0.65398707, 0.32697789],
[0.84574926, 0.42294712],
[0.33591104, 0.49572982]])

np.reshape(a,(4,2))
Out[48]:
array([[0.17545705, 0.61403347],
[0.65398707, 0.32697789],
[0.84574926, 0.42294712],
[0.33591104, 0.49572982]])

np.reshape(a,(2,2,2))
Out[49]:
array([[[0.17545705, 0.61403347],
[0.65398707, 0.32697789]],

[[0.84574926, 0.42294712],
[0.33591104, 0.49572982]]])

np.reshape(a,(3,2))
Traceback (most recent call last):

File "<ipython-input-50-a620d2495b00>", line 1, in <module>
np.reshape(a,(3,2))

File "H:\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py", line 292, in reshape
return _wrapfunc(a, 'reshape', newshape, order=order)

File "H:\Anaconda3\lib\site-packages\numpy\core\fromnumeric.py", line 56, in _wrapfunc
return getattr(obj, method)(*args, **kwds)

ValueError: cannot reshape array of size 8 into shape (3,2)

image_array = np.reshape(china, (w * h, d))
image_array
image_array.shape
Out[51]: (273280, 3)

n_clusters = 64
china = np.array(china, dtype=np.float64) / china.max()
w, h, d = original_shape = tuple(china.shape)
assert d == 3
image_array = np.reshape(china, (w * h, d))

4. 對數據進行K-Means的矢量量化
#首先，使用1000個數據找出質心

image_array_sample = shuffle(image_array, random_state=0)[:1000]
kmeans = KMeans(n_clusters=n_clusters, random_state=0).fit(image_array_sample)
kmeans.cluster_centers_
Out[53]:
array([[0.62570806, 0.60261438, 0.53028322],
[0.15546218, 0.1557423 , 0.12829132],
......
[0.73524384, 0.82021116, 0.91925591],
[0.20627451, 0.07816993, 0.07660131]])

kmeans.cluster_centers_.shape
Out[54]: (64, 3)

#找出質心之後，對所有點數據進行聚類

labels = kmeans.predict(image_array)
labels.shape
Out[55]: (273280,)

#使用質心來替換所有的樣本

image_kmeans = image_array.copy()

for i in range(w*h):
image_kmeans[i] = kmeans.cluster_centers_[labels[i]]

#查看新圖片的信息

image_kmeans
Out[58]:
array([[0.73524384, 0.82021116, 0.91925591],
[0.73524384, 0.82021116, 0.91925591],
[0.73524384, 0.82021116, 0.91925591],
...,
[0.15546218, 0.1557423 , 0.12829132],
[0.07058824, 0.0754637 , 0.0508744 ],
[0.07058824, 0.0754637 , 0.0508744 ]])

#去重

pd.DataFrame(image_kmeans).drop_duplicates().shape
Out[59]: (64, 3)

#恢復圖片

image_kmeans = image_kmeans.reshape(w,h,d)
image_kmeans.shape
Out[60]: (427, 640, 3)

5. 對數據進行隨機的矢量量化

centroid_random = shuffle(image_array, random_state=0)[:n_clusters]
labels_random = pairwise_distances_argmin(centroid_random,image_array,axis=0)
labels_random.shape
Out[61]: (273280,)

len(set(labels_random))
Out[62]: 64

#使用隨機質心替換樣本

image_random = image_array.copy()
for i in range(w*h):
image_random[i] = centroid_random[labels_random[i]]

#恢復圖片

image_random = image_random.reshape(w,h,d)
image_random.shape
Out[64]: (427, 640, 3)

6. 將原圖，按KMeans矢量量化和隨機矢量量化的圖像繪製出來

plt.figure(figsize=(10,10))
plt.axis('off')
plt.title('Original image (96,615 colors)')
plt.imshow(china)

plt.figure(figsize=(10,10))
plt.axis('off')
plt.title('Quantized image (64 colors, K-Means)')
plt.imshow(image_kmeans)

plt.figure(figsize=(10,10))
plt.axis('off')
plt.title('Quantized image (64 colors, Random)')
plt.imshow(image_random)
plt.show()

第一張爲原圖，第二張爲聚類之後的圖，第三張是隨機質心的圖；

從以上的三幅圖中可以看出，第二張聚類之後在塔的顏色上沒有明顯缺失，天空出現鋸齒形，第三張的塔的顏色發生變化。

Python 聚類算法在矢量量化案例詳解

爲什麼要⽤ Foundry

【筆記】動手學深度學習-預備知識

py發送email

MySQL 分庫分表方案，總結太全了。。

Qt/C++音視頻開發71-指定mjpeg/h264格式採集本地攝像頭/存儲文件到mp4/設備推流/採集推流

WPF開源輕便、快速的桌面啓動器

公司來了個新同事，把 DDD 運用得爐火純青！

Kettle 安裝與簡單案例介紹

GIT 史上最詳細Git使用教程

Julia（未來可能替代Python與R語言）數據抽樣與結果評價

mysql 免安裝版本

R語言兩種方法連接oracle以及將處理後的數據導入數據庫中

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結