模型訓練技巧——warm up

原創

CV-deeplearning

2020-05-10 15:39

1. pytorch 中學習率的調節策略

（1）等間隔調整學習率 StepLR

（2）按需調整學習率 MultiStepLR

（3）指數衰減調整學習率 ExponentialLR

（4）餘弦退火調整學習率 CosineAnnealingLR

（5）自適應調整學習率 ReduceLROnPlateau

（6）自定義調整學習率 LambdaLR

每種學習率的參數詳解，見博文：pytorch 學習率參數詳解

2. 論文中和比賽中學習率的調節策略

然而在頂會論文和知名比賽中，作者一般都不會直接使用上述學習率調整策略，而是先預熱模型（warm up）, 即以一個很小的學習率逐步上升到設定的學習率，這樣做會使模型的最終收斂效果更好。

下面，小編以warm up + CosineAnnealingLR來實現學習率的調整。訓練過程中學習率的變化過程如圖中紅色曲線所示：

3. 代碼實現

首先，寫一個warm up的類，重寫get_lr方法。

import torch
from torch.optim.lr_scheduler import _LRScheduler


class WarmUpLR(_LRScheduler):
    """warmup_training learning rate scheduler

    Args:
        optimizer: optimzier(e.g. SGD)
        total_iters: totoal_iters of warmup phase
    """
    def __init__(self, optimizer, total_iters, last_epoch=-1):
        
        self.total_iters = total_iters
        super().__init__(optimizer, last_epoch)

    def get_lr(self):
        """we will use the first m batches, and set the learning
        rate to base_lr * m / total_iters
        """
        return [base_lr * self.last_epoch / (self.total_iters + 1e-8) for base_lr in self.base_lrs]

在訓練代碼中使用：

    criterion = nn.CrossEntropyLoss() 
    optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9, weight_decay=5e-4)

    warmup_epoch = 5
    scheduler = CosineAnnealingLR(optimizer, 100 - warmup_scheduler)
    
    iter_per_epoch = len(train_dataset)
    warmup_scheduler = WarmUpLR(optimizer, iter_per_epoch * warmup_epoch)

    for epoch in range(1, max_epoch+1):
        if epoch >= warmup_epoch:
            scheduler.step()
            learn_rate = scheduler.get_lr()[0]
            print("Learn_rate:%s" % learn_rate)
        test(epoch, net, valloader, criterion)
        train(epoch, net, trainloader, optimizer, criterion, warmup_scheduler)

在train函數中的修改：

for (inputs, targets) in tqdm(trainloader):
        if epoch < 5:
            warmup_scheduler.step()
            warm_lr = warmup_scheduler.get_lr()
            print("warm_lr:%s" % warm_lr)
        inputs, targets = inputs.to(device), targets.to(device)

4. 總結

在論文中和比賽中一般都會用到warm up技巧，特別是在模型難收斂的任務中。在論文中，MultiStepLR和CosineAnnealingLR兩種學習率調節策略用得較多。在知名競賽中，ReduceLROnPlateau學習率調整策略用得較多。小編在工程項目中是怎麼用的呢？一般用warm up結合上述三種調節策略都嘗試一遍，最終哪個模型的精度高就用哪個模型。很多情況下，三個模型的精度差不多，精度差距在±0.5%以內。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

模型訓練技巧——warm up

1. pytorch 中學習率的調節策略

2. 論文中和比賽中學習率的調節策略

3. 代碼實現

4. 總結

Power Automate Desktop 安裝完，登錄後老是提示one driver 錯誤

再談23種設計模式（3）：行爲型模式（學習筆記）

微前端學習筆記(4):從微前端到微模塊之EMP與hel-micro方案探索

微前端學習筆記（1）：微前端總體架構概述，從微服務發微

985 碩士程序員，空窗 4 個月沒有 Offer！

一文搞懂 Spring 循環依賴

賽博鬥地主——使用大語言模型扮演Agent智能體玩牌類遊戲。

VScode右鍵打開(添加到右鍵)

記一次 .NET某工控視覺自動化系統卡死分析

WindowsServer--SQL Server搭建主從同步實現讀寫分離 - 事務性分發

模型訓練技巧——激活函數mish

Linux 高級命令彙總

Pascal VOC數據集轉化爲COCO數據集格式

mmdectionv1 SSD-mobilenetv2實戰

Cascade Mask R-CNN實戰——訓練自己的數據集

Mac下配置sublime實現LaTeX

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結