自研貝葉斯優化算法，如何判斷算法能擬合？我目前是在一個tiny的數據集上跑一下，看算法能否收斂到正確的的局部最小值。這裏要有兩個關鍵詞：

收斂。算法是需要收斂的。黑盒優化的本質就是增加在優勢樣本附近的採樣率。如果算法如同隨機搜索不收斂，那麼是有問題的。
正確。收斂點是正確的，如果收斂到錯誤的點，那還不如隨機搜索。

文章目錄

自研SMAC

遇到的坑

自研SMAC

代理模型

SMAC的本質是用隨機森林作爲代理模型。這個代理模型調包就好了（前提是你熟讀開源代碼千百遍，知道調什麼）

from skopt.learning.forest import RandomForestRegressor, ExtraTreesRegressor

衆所周知，RandomForestRegressor不僅會對行做採樣，也會對列做採樣。ExtraTreesRegressor只會對行做採樣。就我使用的經驗來看，特徵>樣本的情況適合用RF，其餘情況一般用ET。SMAC的論文提到，他使用的隨機森林會用所有的樣本，但特徵的採樣率是83%. （SMAC源碼分析->代理模型的構建）

從skopt文檔來看，一般來說ET要比RF表現好。

獲益函數

就我的經驗來看，EI肯定要比PI好，因爲EI算的是期望，利用的比重其實比探索要大。PI更注重探索，更發散。

skopt實現了EI, PI, LCB, EIPS等。我目前實現了EI，LogEI。

LogEI的代碼可以看SMAC，RoBO

就實驗來看，感覺LogEI和EI差別不大。個人感覺上，無loss_transform+EI == log_scaled loss_transform+LogEI

SMAC的LogEI貌似就是搭配 log_scaled loss_transform的。

RoBO的LogEI與SMAC的實現有很大不同。

使用上來看，EI+ log_scaled loss_transform即可，xi( $\xi$ ) 這個參數設0.01，0好像都沒什麼區別

class EI():
    def __init__(self, xi=0.01):
        # in SMAC, xi=0.0,
        # smac/optimizer/acquisition.py:341
        # par: float=0.0
        # in scikit-optimize, xi=0.01
        # this blog recommend xi=0.01
        # http://krasserm.github.io/2018/03/21/bayesian-optimization/
        self.xi = xi

    def __call__(self, model, X, y_opt):
        mu, std = model.predict(X, return_std=True)
        values = np.zeros_like(mu)
        mask = std > 0
        improve = y_opt - self.xi - mu[mask]
        scaled = improve / std[mask]
        cdf = norm.cdf(scaled)
        pdf = norm.pdf(scaled)
        exploit = improve * cdf
        explore = std[mask] * pdf
        values[mask] = exploit + explore
        # You can find the derivation of the EI formula in this blog
        # http://ash-aldujaili.github.io/blog/2018/02/01/ei/
        return values

class LogEI():
    def __init__(self, xi=0.01):
        self.xi = xi

    def __call__(self, model, X, y_opt):
        mu, std = model.predict(X, return_std=True)
        var = std ** 2
        values = np.zeros_like(mu)
        mask = std > 0
        f_min = y_opt - self.xi
        improve = f_min - mu[mask]
        # in SMAC, v := scaled
        # smac/optimizer/acquisition.py:388
        scaled = improve / std[mask]
        values[mask] = (np.exp(f_min) * norm.cdf(scaled)) - \
                       (np.exp(0.5 * var[mask] + mu[mask]) * norm.cdf(scaled - std[mask]))
        return values

遇到過的坑

不當的loss scale會導致算法陷入錯誤的局部最優

experiment_id=39

實驗代碼：

from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split


X, y = load_digits(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
pipe = SystemClassifier(
    DAG_workflow={
        "num->target": [
            "liblinear_svc",
            "libsvm_svc",
            "logistic_regression",
            "random_forest",
            # "catboost",
        ]
    },
    config_generator="ET",
    config_generator_params={
        "acq_func": "EI",
        "xi": 0,
        "loss_transformer":None,
        "min_points_in_model": 20
    },
    warm_start=False,
    random_state=0,
    min_n_samples_for_SH=50,
    concurrent_type="thread",
    # max_budget=1,
    n_jobs_in_algorithm=3,
    n_workers=1,
    SH_only=True,
    min_budget=1/16,
    max_budget=1/16,
    n_iterations=100,
    # min_budget=1 / 4,
    debug_evaluator=True,
)
pipe.fit(
    X_train, y_train,
    # is_not_realy_run=True,
    fit_ensemble_params=False)
# score = accuracy_score(y_test, y_pred)
score = pipe.score(X_test, y_test)
print(score)

實驗結果：
紅色部分是warming up（啓動過程，即開始時的隨機搜索，20次），藍色是貝葉斯搜索。可以看到啓動時最好的樣本是 (random_forest, 0.977) ，但是算法卻陷入了libsvm_svc的局部最優，真是匪夷所思。

排查後認爲，是沒有對loss（也就是代理模型擬合的label）做log_scaled 變換。


class LogScaledLossTransformer(LossTransformer):
    def fit_transform(self, y, *args):
        y = super(LogScaledLossTransformer, self).fit_transform(y)
        # Subtract the difference between the percentile and the minimum
        y_min = self.y_min - (self.perc - self.y_min)
        y_min -= 1e-10
        # linear scaling
        if y_min == self.y_max:
            # prevent diving by zero
            y_min *= 1 - (1e-10)
        y = (y - y_min) / (self.y_max - y_min)
        y = np.log(y)
        f_max = y[np.isfinite(y)].max()
        f_min = y[np.isfinite(y)].min()
        y[np.isnan(y)] = f_max
        y[y == -np.inf] = f_min
        y[y == np.inf] = f_max
        return y

TPE

我寫的TPE基於一個假設：隨機變量相互獨立。

遇到的坑

experiment_id=39

不對deactivated的隨機變量做impute導致算法前期部分收斂，後期收斂到錯誤的局部最小值

實驗代碼：

    config_generator="TPE",
    config_generator_params={
        "fill_deactivated_value":False,
        "min_points_in_model": 40
    },

前期
前期對除了random_forest以外的choices有所探索，這是正常現象
後期收斂到了錯誤的libsvm_svc（爲什麼都喜歡翻這個算法的牌子，logistic_regression哭暈在廁所）

在predict函數中加了這段代碼後：

 if N_deactivated > 0 and self.fill_deactivated_value:
     good_pdf[~mask, i] = np.random.choice(good_pdf_activated)
     bad_pdf[~mask, i] = np.random.choice(bad_pdf_activated)

experiment_id=35

前期
後期
可以看到隨着算法不斷迭代，TPE的觀測增多，於是算法逐漸從探索轉爲開發，最後變成了在RF附近做局部搜索。

自研貝葉斯優化算法遇到的坑

文章目錄

自研SMAC

代理模型

獲益函數

遇到過的坑

TPE

遇到的坑

關於遊戲付費的一點想法

我通過CKA和CKS啦！

自研貝葉斯優化算法遇到的坑

CSDN-AutoML技術實踐與應用

幾種測試用的黑盒函數

RoBO源碼分析

peewee調研

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結