Pytorch權值初始化及參數分組

1. 模型參數初始化

# ————————————————— 利用model.apply(weights_init)實現初始化
def weights_init(m):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:
        n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
        m.weight.data.normal_(0, math.sqrt(2. / n))
        if m.bias is not None:
            m.bias.data.zero_()
    elif classname.find('BatchNorm') != -1:
        m.weight.data.fill_(1)
        m.bias.data.zero_()
    elif classname.find('Linear') != -1:
        n = m.weight.size(1)
        m.weight.data.normal_(0, 0.01)
        m.bias.data = torch.ones(m.bias.data.size())
        
# ————————————————— 直接放在__init__構造函數中實現初始化
for m in self.modules():
    if isinstance(m, nn.Conv2d):
        n = m.kernel_size[0] * m.kernel_size[1] * m.out_channels
        m.weight.data.normal_(0, math.sqrt(2. / n))
        if m.bias is not None:
            m.bias.data.zero_()
    elif isinstance(m, nn.BatchNorm2d):
        m.weight.data.fill_(1)
        m.bias.data.zero_()
    elif isinstance(m, nn.BatchNorm1d):
        m.weight.data.fill_(1)
        m.bias.data.zero_()
    elif isinstance(m, nn.Linear):
        nn.init.xavier_uniform_(m.weight.data)
        if m.bias is not None:
            m.bias.data.zero_()
        
# —————————————————
self.weight = Parameter(torch.Tensor(out_features, in_features))
self.bias = Parameter(torch.FloatTensor(out_features))
nn.init.xavier_uniform_(self.weight)
nn.init.zero_(self.bias)
nn.init.constant_(m, initm)
# nn.init.kaiming_uniform_()
# self.weight.data.normal_(std=0.001)

2. 模型參數分組weight_decay

def separate_bn_prelu_params(model, ignored_params=[]):
    bn_prelu_params = []
    for m in model.modules():
        if isinstance(m, nn.BatchNorm2d):
            ignored_params += list(map(id, m.parameters()))    
            bn_prelu_params += m.parameters()
        if isinstance(m, nn.BatchNorm1d):
            ignored_params += list(map(id, m.parameters()))    
            bn_prelu_params += m.parameters()
        elif isinstance(m, nn.PReLU):
            ignored_params += list(map(id, m.parameters()))
            bn_prelu_params += m.parameters()
    base_params = list(filter(lambda p: id(p) not in ignored_params, model.parameters()))

    return base_params, bn_prelu_params, ignored_params

OPTIMIZER = optim.SGD([
        {'params': base_params, 'weight_decay': WEIGHT_DECAY},          
        {'params': fc_head_param, 'weight_decay': WEIGHT_DECAY * 10},
        {'params': bn_prelu_params, 'weight_decay': 0.0}
        ], lr=LR, momentum=MOMENTUM )  # , nesterov=True

Note 1PReLU(x) = max(0,x) + a * min(0,x). Here a is a learnable parameter. When called without arguments, nn.PReLU() uses a single parameter a across all input channels. If called with nn.PReLU(nChannels), a separate a is used for each input channel.
Note 2: weight decay should not be used when learning a for good performance.
Note 3: The default number of a to learn is 1, the default initial value of a is 0.25.

3. 參數分組weight_decay–其他

第2節中的內容可以滿足一般的參數分組需求,此部分可以滿足更個性化的分組需求。參考:face_evoLVe_Pytorch-master

自定義schedule

def schedule_lr(optimizer):
    for params in optimizer.param_groups:
        params['lr'] /= 10.
    print(optimizer)

方法一:利用model.modules()和obj.__class__ (更普適)

# model.modules()和model.children()的區別:model.modules()會迭代地遍歷模型的所有子層,而model.children()只會遍歷模型下的一層
# 下面的關鍵詞if 'model',源於模型定義文件。如model_resnet.py中自定義的所有nn.Module子類,都會前綴'model_resnet',所以可通過這種方式一次性篩選出自定義的模塊
def separate_irse_bn_paras(model):
    paras_only_bn = []                  
    paras_no_bn = []
    for layer in model.modules():
        if 'model' in str(layer.__class__):		            # eg. a=[1,2] type(a): <class 'list'>  a.__class__: <class 'list'>
            continue
        if 'container' in str(layer.__class__):             # 去掉Sequential型的模塊
            continue
        else:
            if 'batchnorm' in str(layer.__class__):
                paras_only_bn.extend([*layer.parameters()])
            else:
                paras_no_bn.extend([*layer.parameters()])   # extend()用於在列表末尾一次性追加另一個序列中的多個值(用新列表擴展原來的列表)

    return paras_only_bn, paras_no_bn

方法二:調用modules.parameters和named_parameters()
但是本質上,parameters()是根據named_parameters()獲取,named_parameters()是根據modules()獲取。使用此方法的前提是,須按下文1,2中的方式定義模型,或者利用Sequential+OrderedDict定義模型。

def separate_resnet_bn_paras(model):
    all_parameters = model.parameters()
    paras_only_bn = []

    for pname, p in model.named_parameters():
        if pname.find('bn') >= 0:
            paras_only_bn.append(p)
            
    paras_only_bn_id = list(map(id, paras_only_bn))
    paras_no_bn = list(filter(lambda p: id(p) not in paras_only_bn_id, all_parameters))
    
    return paras_only_bn, paras_no_bn

兩種方法的區別
參數分組的區別,其實對應了模型構造時的區別。舉例:

  1. 構造ResNet的basic block,在__init__()函數中定義了

    self.conv1 = conv3x3(inplanes, planes, stride)
    self.bn1 = BatchNorm2d(planes)
    self.relu = ReLU(inplace = True)

  2. 在forward()中定義

    out = self.conv1(x)
    out = self.bn1(out)
    out = self.relu(out)

  3. 對ResNet取model.name_parameters()返回的pname形如:

    ‘layer1.0.conv1.weight’
    ‘layer1.0.bn1.weight’
    ‘layer1.0.bn1.bias’
    # layer對應conv2_x, …, conv5_x; '0’對應各layer中的block索引,比如conv2_x有3個block,對應索引爲layer1.0, …, layer1.2; 'conv1’就是__init__()中定義的self.conv1

  4. 若構造model時採用了Sequential(),則model.name_parameters()返回的pname形如:‘body.3.res_layer.1.weight’,此處的1.weight實際對應了BN的weight,無法通過pname.find(‘bn’)找到該模塊。

    self.res_layer = Sequential(
    Conv2d(in_channel, depth, (3, 3), (1, 1), 1, bias=False),
    BatchNorm2d(depth),
    ReLU(depth),
    Conv2d(depth, depth, (3, 3), stride, 1, bias=False),
    BatchNorm2d(depth)
    )

  5. 針對4中的情況,兩種解決辦法:利用OrderedDict修飾Sequential,或利用方法一

    downsample = Sequential( OrderedDict([
    (‘conv_ds’, conv1x1(self.inplanes, planes * block.expansion, stride)),
    (‘bn_ds’, BatchNorm2d(planes * block.expansion)),
    ]))
    # 如此,相應模塊的pname將會帶有’conv_ds’,‘bn_ds’字樣

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章