xgboost 安裝、繪圖筆記

系統:ubuntu 16.04

當前文檔很不錯了:[url]https://xgboost.readthedocs.io/en/latest/build.html[/url]

[size=large][color=blue]1、下載源碼[/color][/size]
一行命令搞定,下載的源碼在當前文件夾下,會創建一個xgboost目錄
git clone --recursive https://github.com/dmlc/xgboost



[b]修改導出文件的精度[/b]
在src/tree/tree_model.cc中,修改如下方法,增加一行fo.precision(20);
std::string RegTree::Dump2Text(const FeatureMap& fmap, bool with_stats) const {
std::stringstream fo(""); fo.precision(20);
for (int i = 0; i < param.num_roots; ++i) {
DumpRegTree2Text(fo, *this, fmap, i, 0, with_stats);
}
return fo.str();
}

[size=large][color=blue]2、編譯so[/color][/size]
cd xgboost; make -j4


[size=large][color=blue]3、安裝python包[/color][/size]
cd python-package; sudo python setup.py install



[size=large][color=blue]4、示例[/color][/size]
先來看生成的決策樹:
[img]http://dl2.iteye.com/upload/attachment/0120/5296/d6edff4d-d243-3cfc-9f02-4ae146f80182.png[/img]

#修改自: https://xgboost.readthedocs.io/en/latest/get_started/index.html

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import xgboost as xgb
from sklearn.metrics import roc_auc_score

xgFolder='/home/XXX/tools/xgboost/'

# read in data
dtrain = xgb.DMatrix(xgFolder+'demo/data/agaricus.txt.train')
# 訓練文件第一行內容爲:1 3:1 10:1 11:1 21:1 30:1 34:1 36:1 40:1 41:1 53:1 58:1 65:1 69:1 77:1 86:1 88:1 92:1 95:1 102:1 105:1 117:1 124:1
# 第一個表示標籤爲1, 第3個特徵爲1, 第10個特徵爲1, 。。。

weights=dtrain.get_weight()# 權重矩陣,類型是numpy.ndarray,, 但是不是指的讀入的數據,而是每個sample的權重,不設置就爲[]
labels=dtrain.get_label()# 標籤,類型是numpy.ndarray
print(dtrain.get_base_margin())
print(weights)
print(labels[0])
dtest = xgb.DMatrix(xgFolder+'demo/data/agaricus.txt.test')

# specify parameters via map
# 調參:https://xgboost.readthedocs.io/en/latest/how_to/param_tuning.html
# 參數詳細介紹:https://xgboost.readthedocs.io/en/latest/parameter.html
booster='dart'
# booster='gbtree'
# booster='gblinear'

param = {'max_depth':3, 'eta':1, 'silent':0, 'objective':'binary:logistic','booster':booster }
num_round = 2

bst = xgb.train(param, dtrain, num_round)
# make prediction
preds = bst.predict(dtest)
print('AUC: %.4f'% roc_auc_score(dtest.get_label(), preds))
print('DONE')

#######################################################
# https://xgboost.readthedocs.io/en/latest/python/python_intro.html
# 繪製特徵的重要性和決策樹:
import matplotlib.pyplot as plt
ax=xgb.plot_importance(bst)
plt.show() #沒有這句只有debug模式纔會顯示。。。

# ax=xgb.plot_tree(bst, num_trees=1)
ax=xgb.plot_tree(bst)
plt.show()


#存儲決策樹到圖像
import codecs
f=codecs.open('xgb_tree.png', mode='wb')
g=xgb.to_graphviz(bst)
f.write(g.pipe('png'));
f.close()


輸出(僅結果):
[list]
[*]AUC: 1.0000
[*]DONE
[/list]

[size=large][color=blue]5、有用的資料[/color][/size]
python API:[url]http://xgboost.readthedocs.io/en/latest/python/index.html[/url]

調參:[url]https://xgboost.readthedocs.io/en/latest/how_to/param_tuning.html[/url]
參數詳細介紹:[url]https://xgboost.readthedocs.io/en/latest/parameter.html[/url]

boosted trees 簡介:[url]https://xgboost.readthedocs.io/en/latest/tutorials/index.html[/url]

Awesome XGBoost:[url]https://github.com/dmlc/xgboost/tree/master/demo[/url]


使用C、C++API:
[url]http://stackoverflow.com/questions/36071672/using-xgboost-in-c[/url]
[url]http://qsalg.com/?p=388[/url]
[url]http://stackoverflow.duapp.com/questions/35289674/create-xgboost-dmatrix-in-c/37416279[/url]
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章