LightGBM的原生接口和lightgbm庫接口對比

1、LightGBM原生接口----分類模型

import lightgbm as lgb
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import numpy as np

iris = load_iris()
data=iris.data
target = iris.target
X_train, X_test, y_train, y_test =train_test_split(data, target, test_size=0.2)

# 創建成lgb特徵的數據集格式
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval  = lgb.Dataset(X_test, y_test, reference=lgb_train)

# 將參數寫成字典下形式
params = {
    'boosting_type': 'gbdt',    # 設置提升類型
    'objective': 'multiclass',  # 目標函數
    'num_leaves': 31,           # 一棵樹的葉子節點數
    'learning_rate': 0.05,      # 學習速率
    'num_class': 3,             # 類別數
    'feature_fraction': 0.9,    # 建樹的特徵採樣比例
    'bagging_fraction': 0.8,    # 建樹的樣本採樣比例
    'bagging_freq': 5,          # 每k次迭代執行bagging
}

# 訓練 cv and train
gbm = lgb.train(params, lgb_train, num_boost_round=20, valid_sets=lgb_eval, early_stopping_rounds=5)

y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration)

print(accuracy_score(y_test, np.argmax(y_pred, axis=1)))

[1] valid_0’s multi_logloss: 1.03982
Training until validation scores don’t improve for 5 rounds.
[2] valid_0’s multi_logloss: 0.980791
[3] valid_0’s multi_logloss: 0.927019
[4] valid_0’s multi_logloss: 0.877524
[5] valid_0’s multi_logloss: 0.830831
[6] valid_0’s multi_logloss: 0.785194
[7] valid_0’s multi_logloss: 0.743296
[8] valid_0’s multi_logloss: 0.704658
[9] valid_0’s multi_logloss: 0.667132
[10] valid_0’s multi_logloss: 0.632111
[11] valid_0’s multi_logloss: 0.598285
[12] valid_0’s multi_logloss: 0.568314
[13] valid_0’s multi_logloss: 0.538571
[14] valid_0’s multi_logloss: 0.512754
[15] valid_0’s multi_logloss: 0.487581
[16] valid_0’s multi_logloss: 0.465032
[17] valid_0’s multi_logloss: 0.444311
[18] valid_0’s multi_logloss: 0.424883
[19] valid_0’s multi_logloss: 0.406231
[20] valid_0’s multi_logloss: 0.388667
Did not meet early stopping. Best iteration is:
[20] valid_0’s multi_logloss: 0.388667
0.9666666666666667

2、LightGBM的lightgbm庫接口-----分類

from lightgbm import LGBMClassifier
from sklearn.metrics import accuracy_score
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import numpy as np

iris = load_iris()
data=iris.data
target = iris.target
X_train, X_test, y_train, y_test =train_test_split(data, target, test_size=0.2)

lgbm = LGBMClassifier(
    boosting_type='gbdt', 
    objective='multiclass', 
    num_leaves=31, 
    learning_rate=0.05, 
    n_estimators=20, 
    max_depth=8)

lgbm.fit(X_train, y_train, early_stopping_rounds=5, eval_set=[(X_train, y_train), (X_test, y_test)])

y_pred = lgbm.predict(X_test)

print(accuracy_score(y_test, y_pred))

[1] training’s multi_logloss: 1.03287 valid_1’s multi_logloss: 1.04927
Training until validation scores don’t improve for 5 rounds.
[2] training’s multi_logloss: 0.974241 valid_1’s multi_logloss: 0.993079
[3] training’s multi_logloss: 0.920316 valid_1’s multi_logloss: 0.941041
[4] training’s multi_logloss: 0.870695 valid_1’s multi_logloss: 0.8927
[5] training’s multi_logloss: 0.824893 valid_1’s multi_logloss: 0.848349
[6] training’s multi_logloss: 0.782432 valid_1’s multi_logloss: 0.809369
[7] training’s multi_logloss: 0.742778 valid_1’s multi_logloss: 0.771094
[8] training’s multi_logloss: 0.706151 valid_1’s multi_logloss: 0.736381
[9] training’s multi_logloss: 0.671797 valid_1’s multi_logloss: 0.702219
[10] training’s multi_logloss: 0.639811 valid_1’s multi_logloss: 0.672626
[11] training’s multi_logloss: 0.609749 valid_1’s multi_logloss: 0.642929
[12] training’s multi_logloss: 0.581733 valid_1’s multi_logloss: 0.615645
[13] training’s multi_logloss: 0.555356 valid_1’s multi_logloss: 0.591819
[14] training’s multi_logloss: 0.530454 valid_1’s multi_logloss: 0.567261
[15] training’s multi_logloss: 0.507151 valid_1’s multi_logloss: 0.545222
[16] training’s multi_logloss: 0.485162 valid_1’s multi_logloss: 0.522589
[17] training’s multi_logloss: 0.464386 valid_1’s multi_logloss: 0.505188
[18] training’s multi_logloss: 0.44482 valid_1’s multi_logloss: 0.485115
[19] training’s multi_logloss: 0.426373 valid_1’s multi_logloss: 0.468033
[20] training’s multi_logloss: 0.408915 valid_1’s multi_logloss: 0.450241
Did not meet early stopping. Best iteration is:
[20] training’s multi_logloss: 0.408915 valid_1’s multi_logloss: 0.450241
0.9

3、LightGBM原生接口----迴歸模型

import lightgbm as lgb
from sklearn.metrics import mean_squared_error
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
import numpy as np

boston = load_boston()
data=boston.data
target = boston.target
X_train, X_test, y_train, y_test =train_test_split(data, target, test_size=0.2)

# 創建成lgb特徵的數據集格式
lgb_train = lgb.Dataset(X_train, y_train)
lgb_eval  = lgb.Dataset(X_test, y_test, reference=lgb_train)

# 將參數寫成字典下形式
params = {
    'boosting_type': 'gbdt',    # 設置提升類型
    'objective': 'regression',  # 目標函數
    'num_leaves': 31,           # 一棵樹的葉子節點數
    'learning_rate': 0.05,      # 學習速率
    'feature_fraction': 0.9,    # 建樹的特徵採樣比例
    'bagging_fraction': 0.8,    # 建樹的樣本採樣比例
    'bagging_freq': 5,          # 每k次迭代執行bagging
}

# 訓練 cv and train
gbm = lgb.train(params, lgb_train, num_boost_round=20, valid_sets=lgb_eval, early_stopping_rounds=5)

y_pred = gbm.predict(X_test, num_iteration=gbm.best_iteration)

print(np.sqrt(mean_squared_error(y_test, y_pred)))

[1] valid_0’s l2: 66.8468
Training until validation scores don’t improve for 5 rounds.
[2] valid_0’s l2: 62.1527
[3] valid_0’s l2: 57.6478
[4] valid_0’s l2: 53.7871
[5] valid_0’s l2: 50.4665
[6] valid_0’s l2: 47.1732
[7] valid_0’s l2: 44.223
[8] valid_0’s l2: 41.653
[9] valid_0’s l2: 39.255
[10] valid_0’s l2: 37.0795
[11] valid_0’s l2: 35.3057
[12] valid_0’s l2: 33.6285
[13] valid_0’s l2: 31.9301
[14] valid_0’s l2: 30.411
[15] valid_0’s l2: 29.0514
[16] valid_0’s l2: 27.7196
[17] valid_0’s l2: 26.5142
[18] valid_0’s l2: 25.5741
[19] valid_0’s l2: 24.6561
[20] valid_0’s l2: 23.7944
Did not meet early stopping. Best iteration is:
[20] valid_0’s l2: 23.7944
4.877951543060389

4、LightGBM的lightgbm庫接口-----迴歸

from lightgbm import LGBMRegressor
from sklearn.metrics import mean_squared_error
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
import numpy as np

boston = load_boston()
data=boston.data
target = boston.target
X_train, X_test, y_train, y_test =train_test_split(data, target, test_size=0.2)

lgbm = LGBMRegressor(
    n_estimators=20, 
    boosting_type='gbdt', 
    num_leaves=31, 
    learning_rate=0.05)

lgbm.fit(X_train, y_train, eval_set=[(X_train, y_train), (X_test, y_test)], early_stopping_rounds=5)

y_pred = lgbm.predict(X_test)

print(np.sqrt(mean_squared_error(y_pred, y_test)))

[1] training’s l2: 72.4194 valid_1’s l2: 98.5906
Training until validation scores don’t improve for 5 rounds.
[2] training’s l2: 66.5621 valid_1’s l2: 91.5839
[3] training’s l2: 61.2731 valid_1’s l2: 85.277
[4] training’s l2: 56.4935 valid_1’s l2: 79.591
[5] training’s l2: 52.2028 valid_1’s l2: 74.4869
[6] training’s l2: 48.293 valid_1’s l2: 69.8787
[7] training’s l2: 44.7517 valid_1’s l2: 65.778
[8] training’s l2: 41.581 valid_1’s l2: 62.0688
[9] training’s l2: 38.7042 valid_1’s l2: 58.7032
[10] training’s l2: 36.0728 valid_1’s l2: 55.7413
[11] training’s l2: 33.5913 valid_1’s l2: 52.8012
[12] training’s l2: 31.4397 valid_1’s l2: 50.3597
[13] training’s l2: 29.4483 valid_1’s l2: 47.9421
[14] training’s l2: 27.6236 valid_1’s l2: 45.6147
[15] training’s l2: 25.9851 valid_1’s l2: 43.8716
[16] training’s l2: 24.3707 valid_1’s l2: 41.8788
[17] training’s l2: 22.9099 valid_1’s l2: 40.2979
[18] training’s l2: 21.5822 valid_1’s l2: 38.753
[19] training’s l2: 20.4601 valid_1’s l2: 37.5664
[20] training’s l2: 19.3528 valid_1’s l2: 36.2173
Did not meet early stopping. Best iteration is:
[20] training’s l2: 19.3528 valid_1’s l2: 36.2173
6.018084552690671

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章