sklearn中模型構建、參數調優、模型驗證等的使用

原創

2020-03-31 06:30

1、參數選擇
使用sklearn中算法進行建模時，算法接口提供默認的參數，爲了提高模型的性能，往往需要對模型進行調參，sklearn提供兩種參數搜索方式：一種是GridSearchCV搜索指定參數空間所有參數組合；另一種是RandomizedSearchCV 從特定分佈的參數空間中，選擇一些參數組合進行搜索。

from sklearn.linear_model import LinearRegression, LogisticRegression # 線性模型
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier # 集成學習模型
from sklearn.model_selection import GridSearchCV, RandomizedSearchCV # 兩種參數選擇方式
"GridSearchCV搜索指定參數空間中所有參數的組合，所以適用於數據集較小情況下使用"
# 創建模型
lr = LogisticRegression()
# 設置參數集合
tuned_parameters = [{'penalty':('l1', 'l2'), 'C':[1,10,100,1000]}]
# 創建交叉驗證對象，設置AUC作爲評價指標
lr_clf = GridSearchCV(lr, tuned_parameters, cv=5, scoring='roc_auc')
lr_clf.fit(x_train, y_train)
# 不同參數下交叉驗證的結果
lr_clf.cv_results_
# 參數集合中最優參數
lr_clf.best_params_

"""與GridSearchCV相比，RandomizedSearchCV並不計算所有參數組合，
而是從特定分佈的參數空間中，選擇一些參數組合進行搜索，通過n_iter設置"""
# 設置參數集合
tuned_parameters = [{'penalty':('l1', 'l2'), 'C':[1,10,100,1000]}]
# 創建交叉驗證對象，設置隨機選擇20組參數組合進行評價
lr_clf = RandomizedSearchCV(lr, tuned_parameters, cv=5, scoring='roc_auc', n_iter=20)
lr_clf.fit(x_train, y_train)
lr_clf.predict(x_test) # 模型預測
lr_clf.predict_proba(x_test) # 返回一個概率矩陣，每一列表示一個類別的概率

2、模型驗證

"""模型驗證:
通常將數據集進行劃分，有兩種，一種是用將數據集按比例劃分爲訓練集和測試集，通過test_size調節劃分比例；
另一種是交叉驗證。
"""
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(data, target, test_size=0.2, random_state=33)


from sklearn.model_selection import cross_val_score
from sklearn.model_selection import cross_validate
from sklearn import svm
clf = svm.SVC(kernel='linear', C=1)
scores = cross_validate(clf, data, target, cv=5)
#print(scores)
scores = cross_val_score(clf, data, target, cv=5)

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

sklearn中模型構建、參數調優、模型驗證等的使用

釘釘打卡速度慢

Nginx R31 doc 官方文檔-01-nginx 如何安裝

Qt/C++音視頻開發74-合併標籤圖形/生成yolo運算結果圖形/文字和圖形合併成一個/水印濾鏡

挑戰程序設計競賽 2.2章習題 POJ - 3617 Best Cow Line 貪心

字節面試：MySQL什麼時候鎖表？如何防止鎖表？

.NET8連接SQL SERVER 2008 R2 報：證書鏈是由不受信任的頒發機構頒發的

golang開發環境搭建(win10)

python計算機視覺學習筆記——PIL庫的用法

Golang初學：獲取程序內存使用情況，std runtime

R語言簡單繪圖（一）

利用英文wiki數據訓練Doc2vec模型

利用sklearn 計算 precision、recall、F1 score

R語言簡單繪圖（二）

字符串/數值處理常用函數

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結