梯度提升（Gradient Boosting）訓練一系列的弱學習器（learners），每個學習器都針對前面的學習器的僞殘差（而不是y），以此提升算法的表現（performance）。

維基百科是這樣描述梯度提升的

梯度提升（梯度增強）是一種用於迴歸和分類問題的機器學習技術，其產生的預測模型是弱預測模型的集成，如採用典型的決策樹作爲弱預測模型，這時則爲梯度提升樹（GBT或GBDT）。像其他提升方法一樣，它以分階段的方式構建模型，但它通過允許對任意可微分損失函數進行優化作爲對一般提升方法的推廣。

必要知識

1. 線性迴歸
2. 梯度下降
3. 決策樹

讀完本文以後，您將會學會

1. 梯度提升的概念
2. 梯度提升如何運用於迴歸
3. 從零開始手寫梯度迴歸

算法

下圖很好的表示了梯度提升算法。

(圖片來自一個叫 StatQuest with Josh Starmer 的 Youtuber)

上圖中，第一部分是一個樹樁，它的值是y的平均值。後面分別是第一顆子樹，第二顆，第三顆等。每顆子樹，都是針對前面的模型的僞殘差（pseudo residuals）而訓練的（而非y）。在迴歸問題中，僞殘差恰好等於殘差，但是他們理論上並不是一回事。

$pseudo\ residual=y - previous\_prediction$

流程

Step 1: 計算y的平均值:

$\bar{y}=\frac{1}{n} \sum_{i=1}^{n}y_i$

$F_0(x)=\bar{y}$
Step 2 for m in 1 to M:

Step 2.1: 計算僞殘差:
$r_{im}=y_i-F_{m-1}(x_i)$
Step 2.2: 用僞殘差擬合一顆迴歸樹 $t_m(x)$ 並建立終點區域(Terminal Region，其實就是樹葉) $R_{jm}$ for $j=1...Jm$
Step 2.3: 針對每一個終點區域，有 $p_j$ 個樣本點，計算 $\gamma$ ：

$\gamma_{im}=\frac{1}{p_j} \sum_{x_i \in R_{jm}} r_{im}$

(現實中，上面兩步可以合二爲一，因爲迴歸樹把2.3做了。)
Step 2.4: 更新模型（學習率 $\alpha$ ）:
$F_m(x)=F_{m-1}+\alpha\gamma_m$

Step 3. 輸出模型 $F_M(x)$

合併2.2和2.3以後有

簡化流程

Step 1: 計算y的平均值:

$\bar{y}=\frac{1}{n} \sum_{i=1}^{n}y_i$

$F_0(x)=\bar{y}$
Step 2 for m in 1 to M:

Step 2.1: 計算僞殘差:
$r_{im}=y_i-F_{m-1}(x_i)$
Step 2.2: 用僞殘差擬合一顆迴歸樹
Step 2.3: 更新模型（學習率 $\alpha$ ）:
$F_m(x)=F_{m-1}+\alpha\gamma_m$

Step 3. 輸出模型 $F_M(x)$

(Optional) 從梯度提升算法推理梯度提升迴歸算法

上面的簡化流程的知識，對於手寫梯度提升迴歸算法，已經足夠了。如果有餘力的，可以和我一起從梯度提升（GB）推理出梯度提升迴歸（GBR）

首先我們來看GB的步驟

算法步驟

輸入: 訓練數據 $\{(x_i, y_i)\}_{i=1}^{n}$ , 一個可微分的損失函數 $L(y, F(x))$ ，循環次數M。

算法:

Step 1: 用一個常量 $F_0(x)$ 啓動算法，這個常量滿足以下公式：

$F_0(x)=\underset{\gamma}{\operatorname{argmin}}\sum_{i=1}^{n}L(y_i, \gamma)$

Step 2: for m in 1 to M:

Step 2.1: 計算僞殘差（pseudo-residuals）:

$r_{im}=-[\frac{\partial L(y_i, F(x_i))}{\partial F(x_i)}]_{F(x)=F_{m-1}(x)}$

Step 2.2: 用僞殘差擬合弱學習器 $h_m(x)$ ，建立終點區域 $R_{jm}(j=1...J_m)$
Step 2.3: 針對每個終點區域（也就是每一片樹葉），計算 $\gamma$

$\gamma_{jm}=\underset{\gamma}{\operatorname{argmin}}\sum_{x_i \in R_{jm}}^{n}L(y_i, F_{m-1}(x_i)+\gamma)$

Step 2.4: 更新算法（學習率 $\alpha$ ） :
$F_m(x)=F_{m-1}+\alpha\gamma_m$

Step 3. 輸出算法 $F_M(x)$

To deduce the GB to GBR, I simply define a loss function and solve the loss function in step 1, 2.1 and 2.3. We use sum of squared errror(SSE) as the loss function:

爲了演繹，我們需要一個損失函數，並帶入Step 1， 2.1， 2.3。這裏，因爲是迴歸問題，所以，我們可以用方差和（SSE）

$L(y, \gamma)=\frac{1}{2}\sum_{i=1}^{n}(y_i-\gamma)^2$

對於 step 1:

因爲爲了找到損失函數的最小值，只要找到其一階導數爲0的地方就行了，所以有

$\frac{\partial L(y, F_0)}{\partial F_0}=\frac{\partial \frac{1}{2}\sum_{i=1}^{n}(y_i-F_0)^2}{\partial F_0} =\sum_{i=1}^{n} (y_i-F_0)=0$

變形後得到:

$F_0=\frac{1}{n}\sum_{i=1}^{n}y_i$

對於 step 2.1:

$r_{im}=-[\frac{\partial L(y_i, F(x_i))}{\partial F(x_i)}]_{F(x)=F_{m-1}(x)}$

$=-[\frac{\partial \frac{1}{2}\sum_{i=1}^{n}(y_i-F_{m-1}(x_i))^2)}{\partial F_{m-1}(x_i)}]_{F(x)=F_{m-1}(x)}$

(The chain rule)

$=--2*\frac{1}{2}(y_i-F_{m-1}(x_i))$

$=y_i-F_{m-1}(x_i)$

對於 step 2.3 也一樣:

$\gamma_{jm}=\frac{1}{p_j}\sum_{x_i \in R_j}r_{im}$

手寫代碼

引用python庫

import pandas as pd
from sklearn.tree import DecisionTreeRegressor
from sklearn.datasets import load_boston
import numpy as np
import matplotlib.pyplot as plt

import graphviz 
from sklearn import tree

加載數據

df=pd.DataFrame()
df['name']=['Alex','Brunei','Candy','David','Eric','Felicity']
df['height']=[1.6,1.6,1.5,1.8,1.5,1.4]
df['gender']=['male','female','female','male','male','female']
df['weight']=[88, 76, 56, 73, 77, 57]
display(df)

X=df[['height','gender']].copy()
X.loc[X['gender']=='male','gender']=1
X.loc[X['gender']=='female','gender']=0
y=df['weight']
display(X)

n=df.shape[0]

數據如下：

name	height	gender	weight
Alex	1.6	male	88
Brunei	1.6	female	76
Candy	1.5	female	56
David	1.8	male	73
Eric	1.5	male	77
Felicity	1.4	female	57

X如下：

height	gender
1.6	1
1.6	0
1.5	0
1.8	1
1.5	1
1.4	0

Step 1 平均值

#now let's get started
learning_rate=0.2
loss = [0] * 6
residuals = np.zeros([6,n])
predictoin = np.zeros([6,n])
#calculation
average_y=y.mean()
predictoin[0] = [average_y] * n
residuals[0] = y - predictoin[0]
df['$f_0$']=predictoin[0]
df['$r_0$']=residuals[0]
display(df)
loss[0] = np.sum(residuals[0] ** 2)
trees = []

平均值和殘差如下:

	name	height	gender	weight	𝑓0	𝑟0
0	Alex	1.6	male	88	71.166667	16.833333
1	Brunei	1.6	female	76	71.166667	4.833333
2	Candy	1.5	female	56	71.166667	-15.166667
3	David	1.8	male	73	71.166667	1.833333
4	Eric	1.5	male	77	71.166667	5.833333
5	Felicity	1.4	female	57	71.166667	-14.166667

這裏平均值是 71.2，殘差是16.8, 4.8, 等.

Step 2 循環

我們定義一個循環函數

def iterate(i):
    t = DecisionTreeRegressor(max_depth=1)
    t.fit(X,residuals[i])
    trees.append(t)
    #next prediction, residual
    predictoin[i+1]=predictoin[i]+learning_rate * t.predict(X)
    residuals[i+1]=y-predictoin[i+1]
    loss[i+1] = np.sum(residuals[i+1] ** 2)
    
    df[f'$\gamma_{i+1}$']=t.predict(X)
    df[f'$f_{i+1}$']=predictoin[i+1]
    df[f'$r_{i+1}$']=residuals[i+1]
    
    display(df[['name','height','gender','weight',f'$f_{i}$',f'$r_{i}$',f'$\gamma_{i+1}$',f'$f_{i+1}$',f'$r_{i+1}$']])
    
    dot_data = tree.export_graphviz(t, out_file=None, filled=True, rounded=True,feature_names=X.columns) 
    graph = graphviz.Source(dot_data) 
    display(graph)

循環0

	name	height	gender	weight	𝑓0	𝑟0	𝛾1	𝑓1	𝑟1
0	Alex	1.6	male	88	71.166667	16.833333	8.166667	72.800000	15.200000
1	Brunei	1.6	female	76	71.166667	4.833333	-8.166667	69.533333	6.466667
2	Candy	1.5	female	56	71.166667	-15.166667	-8.166667	69.533333	-13.533333
3	David	1.8	male	73	71.166667	1.833333	8.166667	72.800000	0.200000
4	Eric	1.5	male	77	71.166667	5.833333	8.166667	72.800000	4.200000
5	Felicity	1.4	female	57	71.166667	-14.166667	-8.166667	69.533333	-12.533333

在循環0，我們用殘差0（r0）訓練了一顆樹。這顆樹告訴我們男性比女性重，且男性的體重比平均值高8.167，而女性則少-8.167。所以，我們應該給男性增重，給女性減重。當然，我們還是要一小步一小步的來的，所以，我們有了一個叫學習率的東西，這裏，我們取了0.2。這樣縮放了以後，男性應該增加1.6334公斤，而女性則應該減少-1.6334公斤。最後，循環0預測男性體重爲72.8公斤，女性爲69.5公斤。

循環 1

	name	height	gender	weight	𝑓1	𝑟1	𝛾2	𝑓2	𝑟2
0	Alex	1.6	male	88	72.800000	15.200000	7.288889	74.257778	13.742222
1	Brunei	1.6	female	76	69.533333	6.466667	7.288889	70.991111	5.008889
2	Candy	1.5	female	56	69.533333	-13.533333	-7.288889	68.075556	-12.075556
3	David	1.8	male	73	72.800000	0.200000	7.288889	74.257778	-1.257778
4	Eric	1.5	male	77	72.800000	4.200000	-7.288889	71.342222	5.657778
5	Felicity	1.4	female	57	69.533333	-12.533333	-7.288889	68.075556	-11.075556

在循環1裏面，我們用r1訓練一顆決策樹。這顆新的決策樹告訴我們身高也很重要。1.55米是分水嶺，高之，則體重亦高1.4578公斤。反之則少-1.4578公斤。我們用這個規則計算出f2

Iteration 2

	name	height	gender	weight	𝑓2	𝑟2	𝛾3	𝑓3	𝑟3
0	Alex	1.6	male	88	74.257778	13.742222	6.047407	75.467259	12.532741
1	Brunei	1.6	female	76	70.991111	5.008889	-6.047407	69.781630	6.218370
2	Candy	1.5	female	56	68.075556	-12.075556	-6.047407	66.866074	-10.866074
3	David	1.8	male	73	74.257778	-1.257778	6.047407	75.467259	-2.467259
4	Eric	1.5	male	77	71.342222	5.657778	6.047407	72.551704	4.448296
5	Felicity	1.4	female	57	68.075556	-11.075556	-6.047407	66.866074	-9.866074

Iteration 3

	name	height	gender	weight	𝑓3	𝑟3	𝛾4	𝑓4	𝑟4
0	Alex	1.6	male	88	75.467259	12.532741	5.427951	76.552849	11.447151
1	Brunei	1.6	female	76	69.781630	6.218370	5.427951	70.867220	5.132780
2	Candy	1.5	female	56	66.866074	-10.866074	-5.427951	65.780484	-9.780484
3	David	1.8	male	73	75.467259	-2.467259	5.427951	76.552849	-3.552849
4	Eric	1.5	male	77	72.551704	4.448296	-5.427951	71.466114	5.533886
5	Felicity	1.4	female	57	66.866074	-9.866074	-5.427951	65.780484	-8.780484

Iteration 4

	name	height	gender	weight	𝑓4	𝑟4	𝛾5	𝑓5	𝑟5
0	Alex	1.6	male	88	76.552849	11.447151	4.476063	77.448062	10.551938
1	Brunei	1.6	female	76	70.867220	5.132780	-4.476063	69.972007	6.027993
2	Candy	1.5	female	56	65.780484	-9.780484	-4.476063	64.885271	-8.885271
3	David	1.8	male	73	76.552849	-3.552849	4.476063	77.448062	-4.448062
4	Eric	1.5	male	77	71.466114	5.533886	4.476063	72.361326	4.638674
5	Felicity	1.4	female	57	65.780484	-8.780484	-4.476063	64.885271	-7.885271

損失隨着學習而下降。

希望你看懂了。

代碼放在我的github了:

https://github.com/EricWebsmith/machine_learning_from_scrach/blob/master/Gradiant_Boosting_Regression.ipynb

引用:

https://en.wikipedia.org/wiki/Gradient_boosting

https://www.youtube.com/watch?v=3CC4N4z3GJc&list=PLblh5JKOoLUICTaGLRoHQDuF_7q2GfuJF&index=44

https://www.youtube.com/watch?v=2xudPOBz-vs&list=PLblh5JKOoLUICTaGLRoHQDuF_7q2GfuJF&index=45

推薦以上的Youtube頻道StatQuest

python機器學習手寫算法系列——梯度提升迴歸

算法

(Optional) 從梯度提升算法推理梯度提升迴歸算法

手寫代碼

Step 1 平均值

Step 2 循環

循環0

循環 1

Iteration 2

Iteration 3

Iteration 4

引用:

salesforce零基礎學習（一百三十八）零碎知識點小總結（十）

關於接口協議，你必須要知道這些！

FolkMq v1.4.6 發佈（可以內嵌的消息中間件）

一鍵自動化博客發佈工具,用過的人都說好(頭條篇)

01 穩定性（一）如何應對事故並做好覆盤？

美團一面：項目中有 10000 個 if else 如何優化？想了半天，被問懵了！

線程池那些坑爹的參數-核心線程數&最大線程數&工作隊列

京東面試：如何進行JVM調優？

Stream流常用方法總結

2020年7月中國編程語言排行榜

2020年7月程序員工資統計，平均14357元，又跌了，扎心

2019年11月全國程序員工資統計，區塊鏈工程師比算法工資高。

2019年11月中國大陸編程語言排行榜

python機器學習手寫算法系列——梯度提升迴歸

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結