GBDT文檔:Early stopping of Gradient Boosting
文章目錄
源碼分析
有無early stopping的比較
gbes = ensemble.GradientBoostingClassifier(n_estimators=n_estimators,
validation_fraction=0.2,
n_iter_no_change=5, tol=0.01,
random_state=0)
gb = ensemble.GradientBoostingClassifier(n_estimators=n_estimators,
random_state=0)
打開scikit-learn
源碼,看到sklearn.ensemble._gb.GradientBoostingClassifier
嘗試用Example作爲調試入口:examples/ensemble/plot_gradient_boosting_early_stopping.py
嘗試失敗。Cython和cpp項目就是難搞。
看到他的基類sklearn.ensemble._gb.BaseGradientBoosting#fit
sklearn/ensemble/_gb.py:424
if self.n_iter_no_change is not None:
stratify = y if is_classifier(self) else None
X, X_val, y, y_val, sample_weight, sample_weight_val = (
train_test_split(X, y, sample_weight,
random_state=self.random_state,
test_size=self.validation_fraction,
stratify=stratify))
if is_classifier(self):
if self.n_classes_ != np.unique(y).shape[0]:
# We choose to error here. The problem is that the init
# estimator would be trained on y, which has some missing
# classes now, so its predictions would not have the
# correct shape.
raise ValueError(
'The training data after the early stopping split '
'is missing some classes. Try using another random '
'seed.'
)
else:
X_val = y_val = sample_weight_val = None
這波操作是在整理訓練集與測試集
n_stages = self._fit_stages(
X, y, raw_predictions, sample_weight, self._rng, X_val, y_val,
sample_weight_val, begin_at_stage, monitor, X_idx_sorted)
這應該是熱啓動的操作
begin_at_stage = self.estimators_.shape[0]
_fit_stages
是boosting的全過程
# fit the boosting stages
n_stages = self._fit_stages(
X, y, raw_predictions, sample_weight, self._rng, X_val, y_val,
sample_weight_val, begin_at_stage, monitor, X_idx_sorted)
進入sklearn.ensemble._gb.BaseGradientBoosting#_fit_stages
sklearn/ensemble/_gb.py:516
for i in range(begin_at_stage, self.n_estimators):
一次boosting迭代
# fit next stage of trees
raw_predictions = self._fit_stage(
i, X, y, raw_predictions, sample_weight, sample_mask,
random_state, X_idx_sorted, X_csc, X_csr)
sklearn.ensemble._gb.BaseGradientBoosting#_fit_stage
看了下_fit_stage
函數(與_fit_stages
不同,是擬合單個boosting過程),放幾段關鍵的代碼
residual = loss.negative_gradient(y, raw_predictions_copy, k=k,
sample_weight=sample_weight)
# induce regression tree on residuals
tree = DecisionTreeRegressor(...
tree.fit(X, residual, sample_weight=sample_weight,
check_input=False, X_idx_sorted=X_idx_sorted)
# update tree leaves
loss.update_terminal_regions(
tree.tree_, X, y, residual, raw_predictions, sample_weight,
sample_mask, learning_rate=self.learning_rate, k=k)
# add tree to ensemble
self.estimators_[i, k] = tree
感覺都是些基操,和書上看的擬合負梯度操作一致,oob
的操作需要學習下
對於self.estimators_
這個成員變量我很好奇,於是調查了一下。
def _clear_state(self):
"""Clear the state of the gradient boosting model. """
if hasattr(self, 'estimators_'):
self.estimators_ = np.empty((0, 0), dtype=np.object)
def _resize_state(self):
self.estimators_ = np.resize(self.estimators_,
(total_n_estimators, self.loss_.K))
回到for循環的部分。
看到一個有意思的地方:
if monitor is not None:
early_stopping = monitor(i, self, locals())
if early_stopping:
break
monitor
是fit
中用戶傳入的一個函數。
# By calling next(y_val_pred_iter), we get the predictions
# for X_val after the addition of the current stage
validation_loss = loss_(y_val, next(y_val_pred_iter),
sample_weight_val)
# Require validation_score to be better (less) than at least
# one of the last n_iter_no_change evaluations
if np.any(validation_loss + self.tol < loss_history):
loss_history[i % len(loss_history)] = validation_loss
else:
break
這裏就是early_stop最關鍵的地方了。
loss_history
是一個長度爲n_iter_no_change
的向量,剛開始用np.full
填充爲inf
,然後通過區域操作往裏面放最新的loss。
這個操作還是很巧妙的。
不同參數效果比較
GBDT文檔:Early stopping of Gradient Boosting
validation_fraction=0.2
, n_iter_no_change=5
score_gb
[1.0, 0.9583333333333334, 0.9441666666666667]
score_gbes
[1.0, 0.9638888888888889, 0.915]
validation_fraction=0.2
, n_iter_no_change=10
score_gbes
[1.0, 0.9638888888888889, 0.9266666666666666]
validation_fraction=0.2
, n_iter_no_change=20
score_gbes
[1.0, 0.9638888888888889, 0.9529166666666666]
validation_fraction=0.1
, n_iter_no_change=20
score_gbes
[1.0, 0.9666666666666667, 0.94625]