GBDT文檔：Early stopping of Gradient Boosting

文章目錄

手動實現early_stop

源碼分析

有無early stopping的比較

    gbes = ensemble.GradientBoostingClassifier(n_estimators=n_estimators,
                                               validation_fraction=0.2,
                                               n_iter_no_change=5, tol=0.01,
                                               random_state=0)
    gb = ensemble.GradientBoostingClassifier(n_estimators=n_estimators,
                                             random_state=0)

打開scikit-learn源碼，看到sklearn.ensemble._gb.GradientBoostingClassifier

嘗試用Example作爲調試入口：examples/ensemble/plot_gradient_boosting_early_stopping.py

嘗試失敗。Cython和cpp項目就是難搞。

看到他的基類sklearn.ensemble._gb.BaseGradientBoosting#fit

sklearn/ensemble/_gb.py:424

        if self.n_iter_no_change is not None:
            stratify = y if is_classifier(self) else None
            X, X_val, y, y_val, sample_weight, sample_weight_val = (
                train_test_split(X, y, sample_weight,
                                 random_state=self.random_state,
                                 test_size=self.validation_fraction,
                                 stratify=stratify))
            if is_classifier(self):
                if self.n_classes_ != np.unique(y).shape[0]:
                    # We choose to error here. The problem is that the init
                    # estimator would be trained on y, which has some missing
                    # classes now, so its predictions would not have the
                    # correct shape.
                    raise ValueError(
                        'The training data after the early stopping split '
                        'is missing some classes. Try using another random '
                        'seed.'
                    )
        else:
            X_val = y_val = sample_weight_val = None

這波操作是在整理訓練集與測試集

        n_stages = self._fit_stages(
            X, y, raw_predictions, sample_weight, self._rng, X_val, y_val,
            sample_weight_val, begin_at_stage, monitor, X_idx_sorted)

這應該是熱啓動的操作

begin_at_stage = self.estimators_.shape[0]

_fit_stages是boosting的全過程

# fit the boosting stages
n_stages = self._fit_stages(
    X, y, raw_predictions, sample_weight, self._rng, X_val, y_val,
    sample_weight_val, begin_at_stage, monitor, X_idx_sorted)

進入sklearn.ensemble._gb.BaseGradientBoosting#_fit_stages
sklearn/ensemble/_gb.py:516

for i in range(begin_at_stage, self.n_estimators):

一次boosting迭代

    # fit next stage of trees
    raw_predictions = self._fit_stage(
        i, X, y, raw_predictions, sample_weight, sample_mask,
        random_state, X_idx_sorted, X_csc, X_csr)

sklearn.ensemble._gb.BaseGradientBoosting#_fit_stage

看了下_fit_stage函數（與_fit_stages不同，是擬合單個boosting過程），放幾段關鍵的代碼

            residual = loss.negative_gradient(y, raw_predictions_copy, k=k,
                                              sample_weight=sample_weight)

            # induce regression tree on residuals
            tree = DecisionTreeRegressor(...
            
            tree.fit(X, residual, sample_weight=sample_weight,
                     check_input=False, X_idx_sorted=X_idx_sorted)

            # update tree leaves
            loss.update_terminal_regions(
                tree.tree_, X, y, residual, raw_predictions, sample_weight,
                sample_mask, learning_rate=self.learning_rate, k=k)

            # add tree to ensemble
            self.estimators_[i, k] = tree

感覺都是些基操，和書上看的擬合負梯度操作一致，oob的操作需要學習下

對於self.estimators_這個成員變量我很好奇，於是調查了一下。

    def _clear_state(self):
        """Clear the state of the gradient boosting model. """
        if hasattr(self, 'estimators_'):
            self.estimators_ = np.empty((0, 0), dtype=np.object)

    def _resize_state(self):
            self.estimators_ = np.resize(self.estimators_,
                                     (total_n_estimators, self.loss_.K))

回到for循環的部分。
看到一個有意思的地方：

    if monitor is not None:
        early_stopping = monitor(i, self, locals())
        if early_stopping:
            break

monitor是fit中用戶傳入的一個函數。

    # By calling next(y_val_pred_iter), we get the predictions
    # for X_val after the addition of the current stage
    validation_loss = loss_(y_val, next(y_val_pred_iter),
                            sample_weight_val)

    # Require validation_score to be better (less) than at least
    # one of the last n_iter_no_change evaluations
    if np.any(validation_loss + self.tol < loss_history):
        loss_history[i % len(loss_history)] = validation_loss
    else:
        break

這裏就是early_stop最關鍵的地方了。

loss_history是一個長度爲n_iter_no_change的向量，剛開始用np.full填充爲inf，然後通過區域操作往裏面放最新的loss。

這個操作還是很巧妙的。

不同參數效果比較

GBDT文檔：Early stopping of Gradient Boosting

`validation_fraction=0.2`, `n_iter_no_change=5`

score_gb
[1.0, 0.9583333333333334, 0.9441666666666667]

score_gbes
[1.0, 0.9638888888888889, 0.915]

`validation_fraction=0.2`, `n_iter_no_change=10`

score_gbes
[1.0, 0.9638888888888889, 0.9266666666666666]

`validation_fraction=0.2`, `n_iter_no_change=20`

score_gbes
[1.0, 0.9638888888888889, 0.9529166666666666]

`validation_fraction=0.1`, `n_iter_no_change=20`

score_gbes
[1.0, 0.9666666666666667, 0.94625]

通過源碼分析GBDT是怎麼實現early stopping的

文章目錄

源碼分析

不同參數效果比較

`validation_fraction=0.2`, `n_iter_no_change=5`

`validation_fraction=0.2`, `n_iter_no_change=10`

`validation_fraction=0.2`, `n_iter_no_change=20`

`validation_fraction=0.1`, `n_iter_no_change=20`

手動實現early_stop

一鍵自動化博客發佈工具,用過的人都說好(掘金篇)

「Pygors跨平臺GUI」2：安裝MinGW-w64、MSYS2還是WSL2

[轉帖]

python列出centos7內存使用前50的進程信息

「Pygors跨平臺GUI」1：Pygors跨平臺GUI應用研究

nodejs學習06——小案例

評估統計算法在銀行僞造鈔票檢測中的價值

C# Xmlserializer 程序集內存泄露

Java ThreadPoolShutdown

5月21日相聚上海張江！與文心大模型一起共建大模型產業應用生態圈

自研貝葉斯優化算法遇到的坑

CSDN-AutoML技術實踐與應用

幾種測試用的黑盒函數

RoBO源碼分析

peewee調研

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

通過源碼分析GBDT是怎麼實現early stopping的

文章目錄

源碼分析

不同參數效果比較

validation_fraction=0.2, n_iter_no_change=5

validation_fraction=0.2, n_iter_no_change=10

validation_fraction=0.2, n_iter_no_change=20

validation_fraction=0.1, n_iter_no_change=20

手動實現early_stop

`validation_fraction=0.2`, `n_iter_no_change=5`

`validation_fraction=0.2`, `n_iter_no_change=10`

`validation_fraction=0.2`, `n_iter_no_change=20`

`validation_fraction=0.1`, `n_iter_no_change=20`