pytorch-lenet

course content

  1. lenet 模型介紹
  2. lenet 網絡搭建
  3. 運用lenet進行圖像識別-fashion-mnist數據集

Convolutional Neural Networks

使用全連接層的侷限性:

  • 圖像在同一列鄰近的像素在這個向量中可能相距較遠。它們構成的模式可能難以被模型識別。
  • 對於大尺寸的輸入圖像,使用全連接層容易導致模型過大。

使用卷積層的優勢:

  • 卷積層保留輸入形狀。
  • 卷積層通過滑動窗口將同一卷積核與不同位置的輸入重複計算,從而避免參數尺寸過大。

LeNet 模型

LeNet分爲卷積層塊和全連接層塊兩個部分。下面我們分別介紹這兩個模塊。

Image Name

卷積層塊裏的基本單位是卷積層後接平均池化層:卷積層用來識別圖像裏的空間模式,如線條和物體局部,之後的平均池化層則用來降低卷積層對位置的敏感性。

卷積層塊由兩個這樣的基本單位重複堆疊構成。在卷積層塊中,每個卷積層都使用5×55 \times 5的窗口,並在輸出上使用sigmoid激活函數。第一個卷積層輸出通道數爲6,第二個卷積層輸出通道數則增加到16。

全連接層塊含3個全連接層。它們的輸出個數分別是120、84和10,其中10爲輸出的類別個數。

下面我們通過Sequential類來實現LeNet模型。

#import
!pip install torchtext
import sys
sys.path.append("/home/kesci/input")
import d2lzh1981 as d2l
import torch
import torch.nn as nn
import torch.optim as optim
import time
Collecting torchtext
  Using cached https://files.pythonhosted.org/packages/79/ef/54b8da26f37787f5c670ae2199329e7dccf195c060b25628d99e587dac51/torchtext-0.5.0-py3-none-any.whl
Requirement already satisfied: requests in /opt/conda/lib/python3.6/site-packages (from torchtext)
Requirement already satisfied: tqdm in /opt/conda/lib/python3.6/site-packages (from torchtext)
Requirement already satisfied: torch in /opt/conda/lib/python3.6/site-packages (from torchtext)
Collecting sentencepiece (from torchtext)
  Downloading https://files.pythonhosted.org/packages/74/f4/2d5214cbf13d06e7cb2c20d84115ca25b53ea76fa1f0ade0e3c9749de214/sentencepiece-0.1.85-cp36-cp36m-manylinux1_x86_64.whl (1.0MB)
[K    100% |████████████████████████████████| 1.0MB 4.3kB/s ta 0:00:026
[?25hRequirement already satisfied: numpy in /opt/conda/lib/python3.6/site-packages (from torchtext)
Requirement already satisfied: six in /opt/conda/lib/python3.6/site-packages (from torchtext)
Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /opt/conda/lib/python3.6/site-packages (from requests->torchtext)
Requirement already satisfied: idna<2.7,>=2.5 in /opt/conda/lib/python3.6/site-packages (from requests->torchtext)
Requirement already satisfied: urllib3<1.23,>=1.21.1 in /opt/conda/lib/python3.6/site-packages (from requests->torchtext)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.6/site-packages (from requests->torchtext)
Installing collected packages: sentencepiece, torchtext
Successfully installed sentencepiece-0.1.85 torchtext-0.5.0
[33mYou are using pip version 9.0.1, however version 20.0.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m
#net
class Flatten(torch.nn.Module):  #展平操作
    def forward(self, x):
        return x.view(x.shape[0], -1)

class Reshape(torch.nn.Module): #將圖像大小重定型
    def forward(self, x):
        return x.view(-1,1,28,28)      #(B x C x H x W)
net = torch.nn.Sequential(     #Lelet                                                  
    Reshape(),
    nn.Conv2d(in_channels=1, out_channels=6, kernel_size=5, padding=2), #b*1*28*28  =>b*6*28*28
    nn.Sigmoid(),                      # 激活函數                                 
    nn.AvgPool2d(kernel_size=2, stride=2),                              #b*6*28*28  =>b*6*14*14
    nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5),           #b*6*14*14  =>b*16*10*10
    nn.Sigmoid(),
    nn.AvgPool2d(kernel_size=2, stride=2),                              #b*16*10*10  => b*16*5*5
    Flatten(),                                                          #b*16*5*5   => b*400
    nn.Linear(in_features=16*5*5, out_features=120),# 第一個全連接層
    nn.Sigmoid(),
    nn.Linear(120, 84),
    nn.Sigmoid(),
    nn.Linear(84, 10)
)

接下來我們構造一個高和寬均爲28的單通道數據樣本,並逐層進行前向計算來查看每個層的輸出形狀。

#print
X = torch.randn(size=(1,1,28,28), dtype = torch.float32)
for layer in net:
    X = layer(X)
    print(layer.__class__.__name__,'output shape: \t',X.shape)
    # 這裏處理數據的所有過程
Reshape output shape: 	 torch.Size([1, 1, 28, 28])
Conv2d output shape: 	 torch.Size([1, 6, 28, 28])
Sigmoid output shape: 	 torch.Size([1, 6, 28, 28])
AvgPool2d output shape: 	 torch.Size([1, 6, 14, 14])
Conv2d output shape: 	 torch.Size([1, 16, 10, 10])
Sigmoid output shape: 	 torch.Size([1, 16, 10, 10])
AvgPool2d output shape: 	 torch.Size([1, 16, 5, 5])
Flatten output shape: 	 torch.Size([1, 400])
Linear output shape: 	 torch.Size([1, 120])
Sigmoid output shape: 	 torch.Size([1, 120])
Linear output shape: 	 torch.Size([1, 84])
Sigmoid output shape: 	 torch.Size([1, 84])
Linear output shape: 	 torch.Size([1, 10])

可以看到,在卷積層塊中輸入的高和寬在逐層減小。卷積層由於使用高和寬均爲5的卷積核,從而將高和寬分別減小4,而池化層則將高和寬減半,但通道數則從1增加到16。全連接層則逐層減少輸出個數,直到變成圖像的類別數10。

Image Name

獲取數據和訓練模型

下面我們來實現LeNet模型。我們仍然使用Fashion-MNIST作爲訓練數據集。

# 數據
batch_size = 256
train_iter, test_iter = d2l.load_data_fashion_mnist(
    batch_size=batch_size, root='/home/kesci/input/FashionMNIST2065')
print(len(train_iter))
# 每批數據256個

235

爲了使讀者更加形象的看到數據,添加額外的部分來展示數據的圖像

#數據展示
import matplotlib.pyplot as plt
def show_fashion_mnist(images, labels):
    d2l.use_svg_display()
    # 這裏的_表示我們忽略(不使用)的變量
    _, figs = plt.subplots(1, len(images), figsize=(12, 12))
    for f, img, lbl in zip(figs, images, labels):
        f.imshow(img.view((28, 28)).numpy())
        f.set_title(lbl)
        f.axes.get_xaxis().set_visible(False)
        f.axes.get_yaxis().set_visible(False)
    plt.show()

for Xdata,ylabel in train_iter:
    break
X, y = [], []
for i in range(10):
    print(Xdata[i].shape,ylabel[i].numpy())
    X.append(Xdata[i]) # 將第i個feature加到X中
    y.append(ylabel[i].numpy()) # 將第i個label加到y中
show_fashion_mnist(X, y)
#以前的函數進行展示用
torch.Size([1, 28, 28]) 9
torch.Size([1, 28, 28]) 6
torch.Size([1, 28, 28]) 8
torch.Size([1, 28, 28]) 3
torch.Size([1, 28, 28]) 0
torch.Size([1, 28, 28]) 7
torch.Size([1, 28, 28]) 6
torch.Size([1, 28, 28]) 3
torch.Size([1, 28, 28]) 6
torch.Size([1, 28, 28]) 4

因爲卷積神經網絡計算比多層感知機要複雜,建議使用GPU來加速計算。我們查看看是否可以用GPU,如果成功則使用cuda:0,否則仍然使用cpu

# This function has been saved in the d2l package for future use
#use GPU
def try_gpu():
    """If GPU is available, return torch.device as cuda:0; else return torch.device as cpu."""
    if torch.cuda.is_available():
        device = torch.device('cuda:0')
    else:
        device = torch.device('cpu')
    return device

device = try_gpu()
device
device(type='cuda', index=0)

我們實現evaluate_accuracy函數,該函數用於計算模型net在數據集data_iter上的準確率。

#計算準確率
'''
(1). net.train()
  啓用 BatchNormalization 和 Dropout,將BatchNormalization和Dropout置爲True
(2). net.eval()
不啓用 BatchNormalization 和 Dropout,將BatchNormalization和Dropout置爲False
'''

def evaluate_accuracy(data_iter, net,device=torch.device('cpu')):
    """Evaluate accuracy of a model on the given data set."""
    acc_sum,n = torch.tensor([0],dtype=torch.float32,device=device),0
    for X,y in data_iter:
        # If device is the GPU, copy the data to the GPU.
        X,y = X.to(device),y.to(device)
        net.eval()
        with torch.no_grad():
            y = y.long()
            acc_sum += torch.sum((torch.argmax(net(X), dim=1) == y))  #[[0.2 ,0.4 ,0.5 ,0.6 ,0.8] ,[ 0.1,0.2 ,0.4 ,0.3 ,0.1]] => [ 4 , 2 ]
            n += y.shape[0]
    return acc_sum.item()/n

我們定義函數train_ch5,用於訓練模型。

#訓練函數
def train_ch5(net, train_iter, test_iter,criterion, num_epochs, batch_size, device,lr=None):
    """Train and evaluate a model with CPU or GPU."""
    print('training on', device)
    net.to(device)# 將net放入gpu,準備訓練
    optimizer = optim.SGD(net.parameters(), lr=lr)
    for epoch in range(num_epochs):
        train_l_sum = torch.tensor([0.0],dtype=torch.float32,device=device)
        train_acc_sum = torch.tensor([0.0],dtype=torch.float32,device=device)
        # 將數據放入gpu
        n, start = 0, time.time()
        for X, y in train_iter:
            net.train()
            
            optimizer.zero_grad()
            X,y = X.to(device),y.to(device) # 有關數據全部放入gpu
            y_hat = net(X)
            loss = criterion(y_hat, y)
            loss.backward()
            optimizer.step()
            
            with torch.no_grad():
                y = y.long()
                train_l_sum += loss.float()
                train_acc_sum += (torch.sum((torch.argmax(y_hat, dim=1) == y))).float()
                n += y.shape[0]
        test_acc = evaluate_accuracy(test_iter, net,device)
        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f, '
              'time %.1f sec'
              % (epoch + 1, train_l_sum/n, train_acc_sum/n, test_acc,
                 time.time() - start))

我們重新將模型參數初始化到對應的設備device(cpu or cuda:0)之上,並使用Xavier隨機初始化。損失函數和訓練算法則依然使用交叉熵損失函數和小批量隨機梯度下降。

# 訓練
lr, num_epochs = 0.9, 10

def init_weights(m):
    if type(m) == nn.Linear or type(m) == nn.Conv2d:
        torch.nn.init.xavier_uniform_(m.weight)

net.apply(init_weights)
net = net.to(device)
# 定義weight_init函數,並在weight_init中通過判斷模塊的類型來進行不同的參數初始化定義類型。
# model=Net(…) 創建網絡結構
# model.apply(weight_init),將weight_init初始化方式應用到submodels上
criterion = nn.CrossEntropyLoss()   #交叉熵描述了兩個概率分佈之間的距離,交叉熵越小說明兩者之間越接近
train_ch5(net, train_iter, test_iter, criterion,num_epochs, batch_size,device, lr)
training on cuda:0
epoch 1, loss 0.0087, train acc 0.147, test acc 0.454, time 5.3 sec
epoch 2, loss 0.0042, train acc 0.569, test acc 0.646, time 5.3 sec
epoch 3, loss 0.0031, train acc 0.693, test acc 0.708, time 5.3 sec
epoch 4, loss 0.0026, train acc 0.734, test acc 0.712, time 5.3 sec
epoch 5, loss 0.0024, train acc 0.759, test acc 0.752, time 5.3 sec
epoch 6, loss 0.0022, train acc 0.779, test acc 0.756, time 5.3 sec
epoch 7, loss 0.0021, train acc 0.796, test acc 0.790, time 5.3 sec
epoch 8, loss 0.0020, train acc 0.809, test acc 0.790, time 5.3 sec
epoch 9, loss 0.0019, train acc 0.821, test acc 0.812, time 5.3 sec
epoch 10, loss 0.0018, train acc 0.829, test acc 0.804, time 5.3 sec
# test
for testdata,testlabe in test_iter:
    testdata,testlabe = testdata.to(device),testlabe.to(device)
    break
print(testdata.shape,testlabe.shape)
net.eval()
y_pre = net(testdata)
print(torch.argmax(y_pre,dim=1)[:10])
print(testlabe[:10])
torch.Size([256, 1, 28, 28]) torch.Size([256])
tensor([9, 2, 1, 1, 6, 1, 2, 6, 5, 7], device='cuda:0')
tensor([9, 2, 1, 1, 6, 1, 4, 6, 5, 7], device='cuda:0')

總結:

卷積神經網絡就是含卷積層的網絡。
LeNet交替使用卷積層和最大池化層後接全連接層來進行圖像分類。

  • 池化層有參與模型的正向計算,同樣也會參與反向傳播
  • 池化層直接對窗口內的元素求最大值或平均值,並沒有模型參數參與計算

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章