pytorch中Schedule與warmup_steps的用法

原創

2020-02-28 03:35

lr_scheduler相關

lr_scheduler = WarmupLinearSchedule(optimizer, warmup_steps=args.warmup_steps, t_total=num_train_optimization_steps)

其中args.warmup_steps可以認爲是耐心繫數
num_train_optimization_steps爲模型參數的總更新次數
一般來說：

    num_train_optimization_steps = int(total_train_examples / args.train_batch_size / args.gradient_accumulation_steps)

Schedule用來調節學習率，拿線性變換調整來說，下面代碼中，step是last_epoch。

    def lr_lambda(self, step):
        # 線性變換，返回的是某個數值x，然後返回到類LambdaLR中，最終返回old_lr*x
        if step < self.warmup_steps: # 調低學習率
            return float(step) / float(max(1, self.warmup_steps))
        # 調高學習率
        return max(0.0, float(self.t_total - step) / float(max(1.0, self.t_total - self.warmup_steps)))

在實際運行中，lr_scheduler.step()先將lr初始化爲0. 在第一次參數更新時，此時step=1，lr由0變爲初始值initial_lr；在第二次更新時，step=2，上面代碼中生成某個實數alpha，新的lr=initial_lralpha；在第三次更新時，新的lr是在initial_lr基礎上生成，即新的lr=initial_lralpha。其中warmup_steps可以認爲是lr調整的耐心繫數。
2. gradient_accumulation_steps相關
gradient_accumulation_steps通過累計梯度來解決本地顯存不足問題。
假設原來的batch_size=6，樣本總量爲24，gradient_accumulation_steps=2
那麼參數更新次數=24/6=4
現在，減小batch_size=6/2=3，參數更新次數不變=24/3/2=4
在梯度反傳時，每gradient_accumulation_steps次進行一次梯度更新，之前照常利用loss.backward()計算梯度。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

pytorch中Schedule與warmup_steps的用法

salesforce零基礎學習（一百三十八）零碎知識點小總結（十）

關於接口協議，你必須要知道這些！

一鍵自動化博客發佈工具,用過的人都說好(頭條篇)

01 穩定性（一）如何應對事故並做好覆盤？

美團一面：項目中有 10000 個 if else 如何優化？想了半天，被問懵了！

FolkMq v1.4.6 發佈（可以內嵌的消息中間件）

京東面試：如何進行JVM調優？

線程池那些坑爹的參數-核心線程數&最大線程數&工作隊列

Stream流常用方法總結

python使用github上的包

論文筆記： Medical Exam Question Answering with Large-scale Reading Comprehension

docker容器內uwsgi及nginx服務部署

pytorch中修改現有層及自定義層

git bash新增缺失命令

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結