高級編程技術(Python)作業17

Exercises for sklearn:
Assignment
Solution:

from sklearn import datasets, cross_validation, metrics
from sklearn.naive_bayes import GaussianNB
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier


def performance_evaluation(string, y_test, pred):
    print(string+":")
    acc = metrics.accuracy_score(y_test, pred)
    print('Accuracy:', acc)
    f1 = metrics.f1_score(y_test, pred)
    print('F1-score:', f1)
    auc = metrics.roc_auc_score(y_test, pred)
    print('AUC ROC:', auc, end='\n\n')


dataset = datasets.make_classification(n_samples=1000, n_features=10, n_informative=2, 
                                       n_redundant=2, n_classes=2)

kf = cross_validation.KFold(len(dataset[0]), n_folds=10, shuffle=True)
for train_index, test_index in kf:
    X_train, y_train = dataset[0][train_index], dataset[1][train_index]
    X_test, y_test = dataset[0][test_index], dataset[1][test_index]

clf = GaussianNB()
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
performance_evaluation("GaussianNB", y_test, pred)

clf = SVC(C=1e-01, kernel='rbf', gamma=0.1)
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
performance_evaluation("SVM", y_test, pred)

clf = RandomForestClassifier(n_estimators=6)
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
performance_evaluation("Random Forest", y_test, pred)

Output:

C:\Python\Python36-32\lib\site-packages\sklearn\cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
GaussianNB:
Accuracy: 0.9
F1-score: 0.9038461538461537
AUC ROC: 0.9011604641856743

SVM:
Accuracy: 0.9
F1-score: 0.9038461538461537
AUC ROC: 0.9011604641856743

Random Forest:
Accuracy: 0.94
F1-score: 0.9387755102040817
AUC ROC: 0.9399759903961585

註釋:此次作業輸出報了一個DeprecationWarning,這個Warning是用於警告編程人員,我們正在使用的模塊在某個庫的下個版本將會被移到另一個模塊中,暫時對實驗結果沒有影響,並不用在意。

The short report summarizing the methodology and the results:

  • methodology:
    三種學習的方法和性能分析的方式都十分的類似:
    首先,先建立數據集,然後將數據集分爲訓練集和測試集;
    其次,對訓練集選擇一種訓練方式進行訓練;
    然後,訓練結束後會生成預測集;
    最後對預測集和測試集對比進行性能分析。
  • results:
    經過多次試驗的結果,隨機森林的方法基本上都會優於高斯樸素貝葉斯和SVM算法,高斯樸素貝葉斯和SVM性能相似,基本看不出明顯的差異。
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章