pytorch進行GPU訓練權重初始化的經驗總結

前言

權重如何初始化關係到模型的訓練能否快速收斂，這對於模型能否減少訓練時間也至關重要。
下面以兩個卷積層和一個全連接層的權重初始化爲例子，兩個代碼都只運行一個epoch，來進行對照實驗。
注意使用GPU訓練時候，模型的初始化要設置保存梯度，否則返回的梯度就是0了

未對權重歸一化的結果

代碼

import torch

USE_GPU = True
dtype = torch.float32 # we will be using float throughout this tutorial
if USE_GPU and torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')
 #-------------------------權重
conv_w1 = torch.randn((32, 3, 5, 5), device=device,dtype=dtype) # [out_channel, in_channel, kernel_H, kernel_W]
conv_w1.requires_grad =True
conv_b1 = torch.zeros((32,),device=device, dtype=dtype, requires_grad=True) # out_channel

conv_w2 = torch.randn((16, 32, 3, 3), device=device,dtype=dtype)# [out_channel, in_channel, kernel_H, kernel_W]
conv_w2.requires_grad =True
conv_b2 = torch.zeros((16,),device=device, dtype=dtype, requires_grad=True) # out_channel

# you must calculate the shape of the tensor after two conv layers, before the fully-connected layer
fc_w = torch.randn((16 * 32 * 32, 10),device=device, dtype=dtype)
fc_w.requires_grad =True
fc_b = torch.zeros(10,device=device, dtype=dtype, requires_grad=True)

結果

歸一化權重後

代碼

import torch
USE_GPU = True
dtype = torch.float32 # we will be using float throughout this tutorial
if USE_GPU and torch.cuda.is_available():
    device = torch.device('cuda')
else:
    device = torch.device('cpu')
  
  #------------------------權重
conv_w1 = torch.randn((32, 3, 5, 5), device=device,dtype=dtype) * np.sqrt(2. / (3*5*5))# [out_channel, in_channel, kernel_H, kernel_W]
conv_w1.requires_grad =True
conv_b1 = torch.zeros((32,),device=device, dtype=dtype, requires_grad=True) # out_channel

conv_w2 = torch.randn((16, 32, 3, 3), device=device,dtype=dtype)* np.sqrt(2. / (16*3*3))# [out_channel, in_channel, kernel_H, kernel_W]
conv_w2.requires_grad =True
conv_b2 = torch.zeros((16,),device=device, dtype=dtype, requires_grad=True) # out_channel

# you must calculate the shape of the tensor after two conv layers, before the fully-connected layer
fc_w = torch.randn((16 * 32 * 32, 10),device=device, dtype=dtype)* np.sqrt(2. / (channel_2 * 32 * 32))
fc_w.requires_grad =True
fc_b = torch.zeros(10,device=device, dtype=dtype, requires_grad=True)

結果

結論

可以看出來在進行歸一化後模型可以快速收斂

pytorch進行GPU訓練權重初始化的經驗總結

前言

未對權重歸一化的結果

歸一化權重後

結論

pytorch進行GPU訓練權重初始化的經驗總結

tensorflow keras deblurGAN復現

tensorflow keras 語義分割U-net二分類網絡

python 次級文件夾中所有文件的讀取

python 深度學習 GOPRO數據集的裁剪

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結