pytorch- 批量歸一化

批量歸一化(BatchNormalization)

對輸入的標準化(淺層模型)

處理後的任意一個特徵在數據集中所有樣本上的均值爲0、標準差爲1。
標準化處理輸入數據使各個特徵的分佈相近

批量歸一化(深度模型)

利用小批量上的均值和標準差,不斷調整神經網絡中間輸出,從而使整個神經網絡在各層的中間輸出的數值更穩定。

1.對全連接層做批量歸一化

位置:全連接層中的仿射變換和激活函數之間。
全連接:
x=Wu+boutput=ϕ(x) \boldsymbol{x} = \boldsymbol{W\boldsymbol{u} + \boldsymbol{b}} \\ output =\phi(\boldsymbol{x})

批量歸一化:
output=ϕ(BN(x)) output=\phi(\text{BN}(\boldsymbol{x}))
y(i)=BN(x(i)) \boldsymbol{y}^{(i)} = \text{BN}(\boldsymbol{x}^{(i)})

μB1mi=1mx(i), \boldsymbol{\mu}_\mathcal{B} \leftarrow \frac{1}{m}\sum_{i = 1}^{m} \boldsymbol{x}^{(i)},

σB21mi=1m(x(i)μB)2, \boldsymbol{\sigma}_\mathcal{B}^2 \leftarrow \frac{1}{m} \sum_{i=1}^{m}(\boldsymbol{x}^{(i)} - \boldsymbol{\mu}_\mathcal{B})^2,

x^(i)x(i)μBσB2+ϵ, \hat{\boldsymbol{x}}^{(i)} \leftarrow \frac{\boldsymbol{x}^{(i)} - \boldsymbol{\mu}_\mathcal{B}}{\sqrt{\boldsymbol{\sigma}_\mathcal{B}^2 + \epsilon}},
這⾥ϵ > 0是個很小的常數,保證分母大於0
y(i)γx^(i)+β. {\boldsymbol{y}}^{(i)} \leftarrow \boldsymbol{\gamma} \odot \hat{\boldsymbol{x}}^{(i)} + \boldsymbol{\beta}.

引入可學習參數:拉伸參數γ和偏移參數β。若γ=σB2+ϵ\boldsymbol{\gamma} = \sqrt{\boldsymbol{\sigma}_\mathcal{B}^2 + \epsilon}β=μB\boldsymbol{\beta} = \boldsymbol{\mu}_\mathcal{B},批量歸一化無效。

2.對卷積層做批量歸⼀化

位置:卷積計算之後、應⽤激活函數之前。
如果卷積計算輸出多個通道,我們需要對這些通道的輸出分別做批量歸一化,且每個通道都擁有獨立的拉伸和偏移參數。
計算:對單通道,batchsize=m,卷積計算輸出=pxq
對該通道中m×p×q個元素同時做批量歸一化,使用相同的均值和方差。

3.預測時的批量歸⼀化

訓練:以batch爲單位,對每個batch計算均值和方差。
預測:用移動平均估算整個訓練數據集的樣本均值和方差。

從零實現

#目前GPU算力資源預計17日上線,在此之前本代碼只能使用CPU運行。
#考慮到本代碼中的模型過大,CPU訓練較慢,
#我們還將代碼上傳了一份到 https://www.kaggle.com/boyuai/boyu-d2l-deepcnn
#如希望提前使用gpu運行請至kaggle。
!pip install torchtext
import time
import torch
from torch import nn, optim
import torch.nn.functional as F
import torchvision
import sys
sys.path.append("/home/kesci/input/") 
import d2lzh1981 as d2l
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

def batch_norm(is_training, X, gamma, beta, moving_mean, moving_var, eps, momentum):
    # 判斷當前模式是訓練模式還是預測模式
    if not is_training:
        # 如果是在預測模式下,直接使用傳入的移動平均所得的均值和方差
        X_hat = (X - moving_mean) / torch.sqrt(moving_var + eps)
    else:
        assert len(X.shape) in (2, 4)#判斷x的維度
        if len(X.shape) == 2:
            # 使用全連接層的情況,計算特徵維上的均值和方差,在列上面的
            mean = X.mean(dim=0)
            var = ((X - mean) ** 2).mean(dim=0)
        else:# 高維度
            # 使用二維卷積層的情況,計算通道維上(axis=1)的均值和方差。這裏我們需要保持
            # X的形狀以便後面可以做廣播運算
            # 輸出多個通道,我們需要對這些通道的輸出分別做批量歸一化,且每個通道都擁有獨立的拉伸和偏移參數
            mean = X.mean(dim=0, keepdim=True).mean(dim=2, keepdim=True).mean(dim=3, keepdim=True)# 針對每個通道的,通道有幾個維度,這個就有幾個維度
            var = ((X - mean) ** 2).mean(dim=0, keepdim=True).mean(dim=2, keepdim=True).mean(dim=3, keepdim=True)
        # 訓練模式下用當前的均值和方差做標準化
        X_hat = (X - mean) / torch.sqrt(var + eps)# 加上一個最小的
        # 更新移動平均的均值和方差
        moving_mean = momentum * moving_mean + (1.0 - momentum) * mean
        moving_var = momentum * moving_var + (1.0 - momentum) * var
    Y = gamma * X_hat + beta  # 拉伸和偏移
    return Y, moving_mean, moving_var
Collecting torchtext
  Downloading https://files.pythonhosted.org/packages/79/ef/54b8da26f37787f5c670ae2199329e7dccf195c060b25628d99e587dac51/torchtext-0.5.0-py3-none-any.whl (73kB)
[K    100% |████████████████████████████████| 81kB 31kB/s ta 0:00:01
[?25hRequirement already satisfied: six in /opt/conda/lib/python3.6/site-packages (from torchtext)
Requirement already satisfied: tqdm in /opt/conda/lib/python3.6/site-packages (from torchtext)
Requirement already satisfied: torch in /opt/conda/lib/python3.6/site-packages (from torchtext)
Requirement already satisfied: requests in /opt/conda/lib/python3.6/site-packages (from torchtext)
Requirement already satisfied: numpy in /opt/conda/lib/python3.6/site-packages (from torchtext)
Collecting sentencepiece (from torchtext)
  Downloading https://files.pythonhosted.org/packages/74/f4/2d5214cbf13d06e7cb2c20d84115ca25b53ea76fa1f0ade0e3c9749de214/sentencepiece-0.1.85-cp36-cp36m-manylinux1_x86_64.whl (1.0MB)
[K    100% |████████████████████████████████| 1.0MB 10kB/s eta 0:00:01
[?25hRequirement already satisfied: chardet<3.1.0,>=3.0.2 in /opt/conda/lib/python3.6/site-packages (from requests->torchtext)
Requirement already satisfied: idna<2.7,>=2.5 in /opt/conda/lib/python3.6/site-packages (from requests->torchtext)
Requirement already satisfied: urllib3<1.23,>=1.21.1 in /opt/conda/lib/python3.6/site-packages (from requests->torchtext)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.6/site-packages (from requests->torchtext)
Installing collected packages: sentencepiece, torchtext
Successfully installed sentencepiece-0.1.85 torchtext-0.5.0
[33mYou are using pip version 9.0.1, however version 20.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m
class BatchNorm(nn.Module):
    def __init__(self, num_features, num_dims):
        super(BatchNorm, self).__init__()
        # 是否是多通道
        if num_dims == 2:
            shape = (1, num_features) #全連接層輸出神經元,對列處理
        else:
            shape = (1, num_features, 1, 1)  #通道數,對通道數進行處理
        # 參與求梯度和迭代的拉伸和偏移參數,分別初始化成0和1
        self.gamma = nn.Parameter(torch.ones(shape))
        self.beta = nn.Parameter(torch.zeros(shape))
        # 不參與求梯度和迭代的變量,全在內存上初始化成0
        self.moving_mean = torch.zeros(shape)
        self.moving_var = torch.zeros(shape)

    def forward(self, X):
        # 如果X不在內存上,將moving_mean和moving_var複製到X所在顯存上
        if self.moving_mean.device != X.device:
            self.moving_mean = self.moving_mean.to(X.device)
            self.moving_var = self.moving_var.to(X.device)
        # 保存更新過的moving_mean和moving_var, Module實例的traning屬性默認爲true, 調用.eval()後設成false
        Y, self.moving_mean, self.moving_var = batch_norm(self.training, 
            X, self.gamma, self.beta, self.moving_mean,
            self.moving_var, eps=1e-5, momentum=0.9)
        return Y

基於LeNet的應用

net = nn.Sequential(
            nn.Conv2d(1, 6, 5), # in_channels, out_channels, kernel_size
            BatchNorm(6, num_dims=4),#上一層輸出了6個特徵
            nn.Sigmoid(),
            nn.MaxPool2d(2, 2), # kernel_size, stride
            nn.Conv2d(6, 16, 5),
            BatchNorm(16, num_dims=4),
            nn.Sigmoid(),
            nn.MaxPool2d(2, 2),
            d2l.FlattenLayer(),
            nn.Linear(16*4*4, 120),
            BatchNorm(120, num_dims=2),# 全連接層
            nn.Sigmoid(),
            nn.Linear(120, 84),
            BatchNorm(84, num_dims=2),
            nn.Sigmoid(),
            nn.Linear(84, 10)
        )
print(net)
Sequential(
  (0): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (1): BatchNorm()
  (2): Sigmoid()
  (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (4): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (5): BatchNorm()
  (6): Sigmoid()
  (7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (8): FlattenLayer()
  (9): Linear(in_features=256, out_features=120, bias=True)
  (10): BatchNorm()
  (11): Sigmoid()
  (12): Linear(in_features=120, out_features=84, bias=True)
  (13): BatchNorm()
  (14): Sigmoid()
  (15): Linear(in_features=84, out_features=10, bias=True)
)
#batch_size = 256  
##cpu要調小batchsize
batch_size=16

def load_data_fashion_mnist(batch_size, resize=None, root='/home/kesci/input/FashionMNIST2065'):
    """Download the fashion mnist dataset and then load into memory."""
    trans = []
    if resize:
        trans.append(torchvision.transforms.Resize(size=resize))
    trans.append(torchvision.transforms.ToTensor())
    #torchvision.transforms.ToTensor
# 對於一個圖片img,調用ToTensor轉化成張量的形式,發生的不是將圖片的RGB三維信道矩陣變成tensor

# 圖片在內存中以bytes的形式存儲,轉化過程的步驟是:

# img.tobytes()  將圖片轉化成內存中的存儲格式
# torch.BytesStorage.frombuffer(img.tobytes() )  將字節以流的形式輸入,轉化成一維的張量
# 對張量進行reshape
# 對張量進行transpose
# 將當前張量的每個元素除以255
# 輸出張量
    transform = torchvision.transforms.Compose(trans)# 管理tranforms的
    # 下載數據
    mnist_train = torchvision.datasets.FashionMNIST(root=root, train=True, download=True, transform=transform)
    mnist_test = torchvision.datasets.FashionMNIST(root=root, train=False, download=True, transform=transform)
# 導入數據
    train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=2)
    test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=2)

    return train_iter, test_iter
train_iter, test_iter = load_data_fashion_mnist(batch_size)
lr, num_epochs = 0.001, 5
optimizer = torch.optim.Adam(net.parameters(), lr=lr)
d2l.train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)
training on  cuda
epoch 1, loss 0.5421, train acc 0.829, test acc 0.863, time 41.5 sec
epoch 2, loss 0.3700, train acc 0.869, test acc 0.874, time 41.4 sec
epoch 3, loss 0.3369, train acc 0.880, test acc 0.883, time 41.4 sec
epoch 4, loss 0.3131, train acc 0.887, test acc 0.890, time 41.7 sec
epoch 5, loss 0.2972, train acc 0.892, test acc 0.887, time 41.5 sec

簡潔實現

net = nn.Sequential(
            nn.Conv2d(1, 6, 5), # in_channels, out_channels, kernel_size
            nn.BatchNorm2d(6),
            nn.Sigmoid(),
            nn.MaxPool2d(2, 2), # kernel_size, stride
            nn.Conv2d(6, 16, 5),
            nn.BatchNorm2d(16),
            nn.Sigmoid(),
            nn.MaxPool2d(2, 2),
            d2l.FlattenLayer(),
            nn.Linear(16*4*4, 120),
            nn.BatchNorm1d(120),
            nn.Sigmoid(),
            nn.Linear(120, 84),
            nn.BatchNorm1d(84),
            nn.Sigmoid(),
            nn.Linear(84, 10)
        )
# torch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

optimizer = torch.optim.Adam(net.parameters(), lr=lr)
d2l.train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)
training on  cuda
epoch 1, loss 0.5994, train acc 0.820, test acc 0.856, time 28.6 sec
epoch 2, loss 0.3730, train acc 0.867, test acc 0.866, time 28.4 sec
epoch 3, loss 0.3467, train acc 0.875, test acc 0.867, time 28.7 sec
epoch 4, loss 0.3266, train acc 0.883, test acc 0.877, time 28.7 sec
epoch 5, loss 0.3100, train acc 0.888, test acc 0.869, time 28.7 sec

殘差網絡(ResNet)

深度學習的問題:深度CNN網絡達到一定深度後再一味地增加層數並不能帶來進一步地分類性能提高,反而會招致網絡收斂變得更慢,準確率也變得更差。

殘差塊(Residual Block)

恆等映射:
左邊:f(x)=x
右邊:f(x)-x=0 (易於捕捉恆等映射的細微波動)

Image Name

在殘差塊中,輸⼊可通過跨層的數據線路更快 地向前傳播。

class Residual(nn.Module):  # 本類已保存在d2lzh_pytorch包中方便以後使用
    #可以設定輸出通道數、是否使用額外的1x1卷積層來修改通道數以及卷積層的步幅。
    def __init__(self, in_channels, out_channels, use_1x1conv=False, stride=1):
        super(Residual, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1, stride=stride)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1)
        if use_1x1conv:
            self.conv3 = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride)
        else:  
            self.conv3 = None
        self.bn1 = nn.BatchNorm2d(out_channels)# 歸一化
        self.bn2 = nn.BatchNorm2d(out_channels)

    def forward(self, X):
        Y = F.relu(self.bn1(self.conv1(X)))
        # 卷積,歸一化,激活函數,卷積,(卷積,針對輸入的x),激活
        Y = self.bn2(self.conv2(Y))
        if self.conv3:
            X = self.conv3(X)
        return F.relu(Y + X)
blk = Residual(3, 3)#輸入通道3,輸出通道3
X = torch.rand((4, 3, 6, 6))
blk(X).shape # torch.Size([4, 3, 6, 6])
torch.Size([4, 3, 6, 6])
blk = Residual(3, 6, use_1x1conv=True, stride=2)
blk(X).shape # torch.Size([4, 6, 3, 3])
torch.Size([4, 6, 3, 3])

ResNet模型

卷積(64,7x7,3)
批量一體化
最大池化(3x3,2)

殘差塊x4 (通過步幅爲2的殘差塊在每個模塊之間減小高和寬)

全局平均池化

全連接

# ResNet模型
net = nn.Sequential(
        nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
        nn.BatchNorm2d(64), 
        nn.ReLU(),
        nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
def resnet_block(in_channels, out_channels, num_residuals, first_block=False):
    if first_block:
        assert in_channels == out_channels # 第一個模塊的通道數同輸入通道數一致,輸入和輸出一樣
    blk = []
    for i in range(num_residuals):
        if i == 0 and not first_block:
            blk.append(Residual(in_channels, out_channels, use_1x1conv=True, stride=2))
        else:
            blk.append(Residual(out_channels, out_channels))
    return nn.Sequential(*blk)

net.add_module("resnet_block1", resnet_block(64, 64, 2, first_block=True))
net.add_module("resnet_block2", resnet_block(64, 128, 2))
net.add_module("resnet_block3", resnet_block(128, 256, 2))
net.add_module("resnet_block4", resnet_block(256, 512, 2))
net.add_module("global_avg_pool", d2l.GlobalAvgPool2d()) # GlobalAvgPool2d的輸出: (Batch, 512, 1, 1)
net.add_module("fc", nn.Sequential(d2l.FlattenLayer(), nn.Linear(512, 10))) 
X = torch.rand((1, 1, 224, 224))
for name, layer in net.named_children():
    X = layer(X)
    print(name, ' output shape:\t', X.shape)
0  output shape:	 torch.Size([1, 64, 112, 112])
1  output shape:	 torch.Size([1, 64, 112, 112])
2  output shape:	 torch.Size([1, 64, 112, 112])
3  output shape:	 torch.Size([1, 64, 56, 56])
resnet_block1  output shape:	 torch.Size([1, 64, 56, 56])
resnet_block2  output shape:	 torch.Size([1, 128, 28, 28])
resnet_block3  output shape:	 torch.Size([1, 256, 14, 14])
resnet_block4  output shape:	 torch.Size([1, 512, 7, 7])
global_avg_pool  output shape:	 torch.Size([1, 512, 1, 1])
fc  output shape:	 torch.Size([1, 10])
lr, num_epochs = 0.001, 5
optimizer = torch.optim.Adam(net.parameters(), lr=lr)
d2l.train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)
training on  cuda
epoch 1, loss 0.4905, train acc 0.822, test acc 0.868, time 757.5 sec

稠密連接網絡(DenseNet)

Image Name

###主要構建模塊:
稠密塊(dense block): 定義了輸入和輸出是如何連結的。
過渡層(transition layer):用來控制通道數,使之不過大。

稠密塊

def conv_block(in_channels, out_channels):
    blk = nn.Sequential(nn.BatchNorm2d(in_channels), 
                        nn.ReLU(),
                        nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1))
    return blk
# 稠密塊
class DenseBlock(nn.Module):
    def __init__(self, num_convs, in_channels, out_channels):# 參數,幾塊,輸入通道,輸出通道
        super(DenseBlock, self).__init__()
        net = []
        for i in range(num_convs):
            in_c = in_channels + i * out_channels
            net.append(conv_block(in_c, out_channels))
        self.net = nn.ModuleList(net)
        self.out_channels = in_channels + num_convs * out_channels # 計算輸出通道數

    def forward(self, X):
        for blk in self.net:
            Y = blk(X)
            X = torch.cat((X, Y), dim=1)  # 在通道維上將輸入和輸出連結
        return X
blk = DenseBlock(2, 3, 10)
X = torch.rand(4, 3, 8, 8)
Y = blk(X)
Y.shape # torch.Size([4, 23, 8, 8])

過渡層:減小模型複雜度

1×11\times1卷積層:來減小通道數
步幅爲2的平均池化層:減半高和寬

def transition_block(in_channels, out_channels):
    blk = nn.Sequential(
            nn.BatchNorm2d(in_channels), 
            nn.ReLU(),
            nn.Conv2d(in_channels, out_channels, kernel_size=1),
            nn.AvgPool2d(kernel_size=2, stride=2))
    return blk

blk = transition_block(23, 10)
blk(Y).shape # torch.Size([4, 10, 4, 4])

DenseNet模型

net = nn.Sequential(
        nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),
        nn.BatchNorm2d(64), 
        nn.ReLU(),
        nn.MaxPool2d(kernel_size=3, stride=2, padding=1))
# 卷積,歸一化,激活,池化,4層稠密層(0,1,2,加上過度,3),歸一化,激活,池化,(圖像處理,全連接)
num_channels, growth_rate = 64, 32  # num_channels爲當前的通道數
num_convs_in_dense_blocks = [4, 4, 4, 4]

for i, num_convs in enumerate(num_convs_in_dense_blocks):
    DB = DenseBlock(num_convs, num_channels, growth_rate)
    net.add_module("DenseBlosk_%d" % i, DB)
    # 上一個稠密塊的輸出通道數
    num_channels = DB.out_channels
    # 在稠密塊之間加入通道數減半的過渡層
    if i != len(num_convs_in_dense_blocks) - 1:
        net.add_module("transition_block_%d" % i, transition_block(num_channels, num_channels // 2))
        num_channels = num_channels // 2
net.add_module("BN", nn.BatchNorm2d(num_channels))
net.add_module("relu", nn.ReLU())
net.add_module("global_avg_pool", d2l.GlobalAvgPool2d()) # GlobalAvgPool2d的輸出: (Batch, num_channels, 1, 1)
net.add_module("fc", nn.Sequential(d2l.FlattenLayer(), nn.Linear(num_channels, 10))) 

X = torch.rand((1, 1, 96, 96))
for name, layer in net.named_children():
    X = layer(X)
    print(name, ' output shape:\t', X.shape)
#batch_size = 256
batch_size=16
# 如出現“out of memory”的報錯信息,可減小batch_size或resize
train_iter, test_iter =load_data_fashion_mnist(batch_size, resize=96)
lr, num_epochs = 0.001, 5
optimizer = torch.optim.Adam(net.parameters(), lr=lr)
d2l.train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章