你不得不瞭解的卷積神經網絡發展史

一、卷積神經網絡系列模型發展綜述

  • 1.LeNet
  • 2.AlexNet
  • 3.VGG
  • 4.GoogleNet
  • 5.ResNet
  • 6.DenseNet
  • 7.Non-Local Networks
  • 8.Deformable Convolutional Networks
  • 9.Dilated Convolutional Networks
  • 10.SENET

01 卷積神經網絡的基本組成

在不同的資料中,對CNN的組成部分有着不同的描述。不過,CNN的基本組成成分是比較一致的,一般CNN含有三種類型的神經網絡層:

  • 卷積層(Convolutions layer):學習輸入數據的特徵表示,卷積層由很多的卷積核(convolutional kernel)組成,卷積核用來計算不同的特徵圖(feature map)。激活函數(activation function)給CNN卷積神經網絡引入了非線性,常用的有sigmoid,tanh,relu函數。
  • 池化層(Pooling layer):降低卷積層輸出的特徵向量,同時改善結果,使結構不容易出現過擬合。典型的操作包括平均池化和最大池化。通過卷積層和池化層,可以獲得更多的抽象特徵。
  • 全連接層(Full connected layer):將卷積層和池化層堆疊起來以後,就能夠形成一層或者多層全連接層,這樣就能夠實現高階的推理能力,在整個卷積神經網絡中起到“分類器”的作用。如果說卷積層、池化層和激活函數層等操作是將原始數據映射到隱層特徵空間的話,全連接層則起到將學到的“分佈式特徵表示”映射到樣本標記空間的作用。sigmoid/tanh常見於全連接層,relu常見於卷積層。

卷積神經網絡(CNN)是一種常見的深度學習架構,受生物自然視覺認知機制啓發而來。

二、 LeNet-5模型的提出

在1998年,LeCun提出文章《Gradient-based Learning applied to document recognition》。
http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf

01 LeNet-5模型結構

LeNet-5模型共有7層。

  • 第一層:卷積層

    接受32321的圖像輸入。本層包含6個大小爲55,步長爲11的卷積核,padding類型爲valid。輸出神經元爲28286.

  • 第二層:池化層

    對上一層的輸出做22的max pooling,輸出神經元爲1414*6。

  • 第三層:卷積層

    接受14146的輸入。本層有16個大小爲55,步長爲11的卷積核,padding類型爲valid。輸出神經元爲101016.

  • 第四層:池化層

    對上一層的輸出做22的max pooling,輸出神經元爲55*16

  • 第五層:全連接層

    本層將上層的5516的神經元展開作爲輸入,本層包含120個神經元。

  • 第六層:全連接層

    本層包含84個神經元

  • 第七層:全連接層

    本層包含10個神經元,分別代表數字0到9.

  • 第八層:激活函數

    前7層論文采用的tanh激活函數,輸出層論文采用的是Guasian Connection.

02 LeNet-5代碼實現

# 導包
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense,Flatten,Conv2D,MaxPool2D
from keras.optimizers import SGD
from keras.utils import to_categorical
Using TensorFlow backend.
# 加載數據集
(x_train,y_train),(x_test,y_test) = mnist.load_data()
print(x_train.shape,y_train.shape,x_test.shape,y_test.shape)
(60000, 28, 28) (60000,) (10000, 28, 28) (10000,)
# 數據集的預處理操作
x_train = x_train.reshape(60000,28,28,1)
x_test = x_test.reshape(10000,28,28,1)

# 標籤變爲one-hot編碼的形式
y_train = to_categorical(y_train,num_classes=10)
y_test = to_categorical(y_test,num_classes=10)
# LeNet-5模型的定義
model = Sequential()
model.add(Conv2D(filters = 6,
                 kernel_size = (5,5),
                 padding = "valid",
                 input_shape = (28,28,1),
                 activation = "tanh"))
model.add(MaxPool2D(pool_size = (2,2)))
model.add(Conv2D(filters = 16,
                 kernel_size = (5,5),
                 padding = "valid",
                 activation = "tanh"))
model.add(MaxPool2D(pool_size = (2,2)))
model.add(Flatten())
model.add(Dense(120,activation = "tanh"))
model.add(Dense(84,activation = "tanh"))
model.add(Dense(10,activation = "softmax"))
WARNING:tensorflow:From C:\software\anaconda\lib\site-packages\keras\backend\tensorflow_backend.py:4070: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.
model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 24, 24, 6)         156       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 12, 12, 6)         0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 8, 8, 16)          2416      
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 4, 4, 16)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 256)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 120)               30840     
_________________________________________________________________
dense_2 (Dense)              (None, 84)                10164     
_________________________________________________________________
dense_3 (Dense)              (None, 10)                850       
=================================================================
Total params: 44,426
Trainable params: 44,426
Non-trainable params: 0
_________________________________________________________________
# 訓練模型
sgd = SGD(lr = 0.05,
          decay = 1e-6,
          momentum = 0.9,
          nesterov= True)
model.compile(optimizer = sgd,
              loss = "categorical_crossentropy",
              metrics = ["accuracy"])
model.fit(x_train,
          y_train,
          batch_size = 514,
          epochs = 8,
          verbose = 1,
          validation_data=(x_test,y_test),
          shuffle = True)
WARNING:tensorflow:From C:\software\anaconda\lib\site-packages\keras\backend\tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

Train on 60000 samples, validate on 10000 samples
Epoch 1/8
60000/60000 [==============================] - 11s 179us/step - loss: 0.3972 - accuracy: 0.8807 - val_loss: 0.1354 - val_accuracy: 0.9594
Epoch 2/8
60000/60000 [==============================] - 10s 167us/step - loss: 0.1165 - accuracy: 0.9645 - val_loss: 0.0901 - val_accuracy: 0.9713
Epoch 3/8
60000/60000 [==============================] - 10s 166us/step - loss: 0.0925 - accuracy: 0.9713 - val_loss: 0.0962 - val_accuracy: 0.9705
Epoch 4/8
60000/60000 [==============================] - 10s 165us/step - loss: 0.0800 - accuracy: 0.9757 - val_loss: 0.0697 - val_accuracy: 0.9779
Epoch 5/8
60000/60000 [==============================] - 10s 162us/step - loss: 0.0648 - accuracy: 0.9805 - val_loss: 0.0825 - val_accuracy: 0.9738
Epoch 6/8
60000/60000 [==============================] - 10s 168us/step - loss: 0.0615 - accuracy: 0.9806 - val_loss: 0.0602 - val_accuracy: 0.9809
Epoch 7/8
60000/60000 [==============================] - 10s 161us/step - loss: 0.0568 - accuracy: 0.9828 - val_loss: 0.0550 - val_accuracy: 0.9822
Epoch 8/8
60000/60000 [==============================] - 9s 156us/step - loss: 0.0466 - accuracy: 0.9856 - val_loss: 0.0510 - val_accuracy: 0.9832





<keras.callbacks.callbacks.History at 0x19b6145f4a8>

三、 AlexNet提出

AlexNet是Hinton和他的學生Alex在2012年設計的網絡,並獲得了當年的ImageNet的競賽冠軍。
https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

01 AlexNet的模型結構

首先上面這幅圖可以分爲上下兩個部分的網絡,正如論文中提到的這兩部分網絡是分別兩個GPU的,只有到了特定的網絡層後才需要兩塊GPU交互。這種設置完全就是利用兩個GPU來提高運算效率,在網絡結構上差異並不是很大,所以我們可以直接當成一塊GPU來看待。

那麼AlexNet的整個網絡結構就是由5個卷積層和3個全連接層組成的,深度總共8層。

上圖中的輸入是224×224,不過經過計算(224−11)/4=54.75並不是論文中的55×55,而使用227×227作爲輸入,則(227−11)/4=55

  • 卷積層C1

  • 該層的處理流程是: 卷積–>ReLU–>池化–>歸一化。

  • 卷積,輸入是227×227,使用96個11×11×3的卷積核,得到的FeatureMap爲55×55×96。

  • ReLU,將卷積層輸出的FeatureMap輸入到ReLU函數中。

  • 池化,使用3×3步長爲2的池化單元(重疊池化,步長小於池化單元的寬度),輸出爲27×27×96((55−3)/2+1=27)

  • 局部響應歸一化,使用k=2,n=5,α=10−4,β=0.75進行局部歸一化,輸出的仍然爲27×27×96,輸出分爲兩組,每組的大小爲27×27×48

  • 卷積層C2

  • 該層的處理流程是:卷積–>ReLU–>池化–>歸一化

  • 卷積,輸入是2組27×27×48。使用2組,每組128個尺寸爲5×5×48的卷積核,並作了邊緣填充padding=2,卷積的步長爲1. 則輸出的FeatureMap爲2組,每組的大小爲27×27 times128. ((27+2∗2−5)/1+1=27)

  • ReLU,將卷積層輸出的FeatureMap輸入到ReLU函數中

  • 池化運算的尺寸爲3×3,步長爲2,池化後圖像的尺寸爲(27−3)/2+1=13,輸出爲13×13×256

  • 局部響應歸一化,使用k=2,n=5,α=10−4,β=0.75進行局部歸一化,輸出的仍然爲13×13×256,輸出分爲2組,每組的大小爲13×13×128

  • 卷積層C3

  • 該層的處理流程是: 卷積–>ReLU

  • 卷積,輸入是13×13×256,使用2組共384尺寸爲3×3×256的卷積核,做了邊緣填充padding=1,卷積的步長爲1.則輸出的FeatureMap爲13×13 times384

  • ReLU,將卷積層輸出的FeatureMap輸入到ReLU函數中

  • 卷積層C4

  • 該層的處理流程是: 卷積–>ReLU該層和C3類似。

  • 卷積,輸入是13×13×384,分爲兩組,每組爲13×13×192.使用2組,每組192個尺寸爲3×3×192的卷積核,做了邊緣填充padding=1,卷積的步長爲1.則輸出的FeatureMap爲13×13 times384,分爲兩組,每組爲13×13×192

  • ReLU,將卷積層輸出的FeatureMap輸入到ReLU函數中

  • 卷積層C5

  • 該層處理流程爲:卷積–>ReLU–>池化

  • 卷積,輸入爲13×13×384,分爲兩組,每組爲13×13×192。使用2組,每組爲128尺寸爲3×3×192的卷積核,做了邊緣填充padding=1,卷積的步長爲1.則輸出的FeatureMap爲13×13×256

  • ReLU,將卷積層輸出的FeatureMap輸入到ReLU函數中

  • 池化,池化運算的尺寸爲3×3,步長爲2,池化後圖像的尺寸爲 (13−3)/2+1=6,即池化後的輸出爲6×6×256

  • 全連接層FC6

  • 該層的流程爲:(卷積)全連接 -->ReLU -->Dropout

  • 卷積->全連接: 輸入爲6×6×256,該層有4096個卷積核,每個卷積核的大小爲6×6×256。由於卷積核的尺寸剛好與待處理特徵圖(輸入)的尺寸相同,即卷積核中的每個係數只與特徵圖(輸入)尺寸的一個像素值相乘,一一對應,因此,該層被稱爲全連接層。由於卷積核與特徵圖的尺寸相同,卷積運算後只有一個值,因此,卷積後的像素層尺寸爲4096×1×1,即有4096個神經元。

  • ReLU,這4096個運算結果通過ReLU激活函數生成4096個值

  • Dropout,抑制過擬合,隨機的斷開某些神經元的連接或者是不激活某些神經元

  • 全連接層FC7

  • 流程爲:全連接–>ReLU–>Dropout

  • 全連接,輸入爲4096的向量

  • ReLU,這4096個運算結果通過ReLU激活函數生成4096個值

  • Dropout,抑制過擬合,隨機的斷開某些神經元的連接或者是不激活某些神經元

  • 全連接層FC8

  • 第七層輸出的4096個數據與第八層的1000個神經元進行全連接,經過訓練後輸出1000個float型的值,這就是預測結果。

02 AlexNet模型特點

  • ReLU Nonlinearity(Rectified Linear Unit)

    在過去,神經網絡的激活函數通常是sigmoid或者tanh函數,這兩種函數最大的缺點就是其飽和性。當輸入的x過大或過小時,函數的輸出會非常接近於+1於-1,在這裏斜率會非常小,那麼在訓練時引用梯度下降時,其飽和性會使梯度非常小,嚴重降低了網絡的訓練速度。
    而Relu函數表示爲max(0,x),當x>0會輸出x,斜率恆爲1,在實際使用時,神經網絡的收斂速度要快過傳統的激活函數10倍。

  • Training on multiple GPUs

利用多個GPU進行分佈式計算

  • Local respnse normalization

在使用飽和型的激活函數時,通常需要對輸入進行歸一化處理,以利用激活函數在0附近的線性特性和非線性特性,並避免飽和。但對於ReLU函數,不需要輸入歸一化。然而Alex等人發現LRN這種歸一化方式可以幫助提高網絡的泛化性能。

LRN的作用就是,對位置(x,y)處的像素計算其餘幾個相鄰的kernel maps的像素值的和,併除以這個和來歸一化。kernel maps的順序可以是任意的,在訓練開始前確定順序即可。

Hinton等人認爲LRN層模仿生物神經系統的側抑制機制,對局部神經元的活動創建競爭機制,使得響應比較大的值相對更大,提高模型的泛化能力。但是後來的論文如提出VGG網絡的文章中證明,LRN對CNN並沒有什麼作用,反而增加了計算複雜度,因此,這一技術也不再使用了。

  • Overlapping pooling

池化層是CNN中非常重要的一層,可以起到提取主要特徵,減少特徵圖尺寸的作用,對加速CNN計算非常重要,然而通常池化的大小與步進被設置爲相同的大小,當池化的大小大於步進時,就成爲了overlapping pooling。

在先前的LeNet中的池化是不重疊的,即池化的窗口大小和步長是相等的。在AlexNet中使用的池化卻是可重疊的,也就是說,在池化的時候,每次移動的步長小於池化的窗口長度。AlexNet池化的大小爲3*3的正方形,每次池化移動步長爲2,這樣就會出現重疊。重疊池化可以避免過擬合。

  • Reducing Overfitting

AlexNet中有6kw個參數,非常容易產生過擬合現象,而AlexNet中採用了兩種方式來對抗過擬合。

  • Data augmentation

對抗過擬合最簡單有效的方法就是擴大訓練集的大小,AlexNet使用了兩種增加訓練集大小的方式。

1.Image translations and horizontal reflections.對原始的256*256大小的圖片隨機裁剪爲224*224大小,並進行隨機翻轉,這兩種操作相當於把訓練集擴大了32*32*2=2048倍。在測試時,AlexNet把輸入圖片與其水平翻轉在四個角處於正中心共五個地方各裁剪下224*224大小的子圖,即共裁剪出10個子圖,均送入AlexNet中,並把10個softmax輸出求平均。如果沒有這些操作,AlexNet將出現嚴重的過擬合,使網絡的深度不能達到這麼深。

2.Altering the intensities of the RGB channel. AlexNet對RGB通道使用了PCA(主成分分析),對每個訓練圖片的每個像素,提取出RGB三個通道的特徵向量與特徵值,對每個特徵值乘以α,α是一個均值0.1且方差服從高斯分佈的隨機變量。
  • Dropout

Dropout是神經網絡中一種非常有效的減少過擬合的方法,對每個神經元設置一個一個keep_prob用來表示這個神經元被保留的概率,如果神經元沒被保留,換句話說這個神經元被"dropout"了,那麼這個神經元的輸出被設置爲0,在殘差反向傳播時,傳播到該神經元的值也爲0,因此可以認爲神經網絡中不存在這個神經元;而在下次迭代中,所有神經元將會根據keep_prob被重新隨機dropout。相當於每次迭代,神經網絡的拓撲結構都會有所不同,這就會迫使神經網絡不會過度依賴某幾個神經元或者說某些特徵。因此,神經元會被迫去學習更具有魯棒性的特徵。

03 AlexNet源碼實現

AlexNet模型建立在千分類的問題上,其算力對計算機要求很高。這裏爲了簡單復現,使用了Tensorflow的數據集oxflower17,此數據集對花朵進行17分類,每個分類有80張圖片。

# 導包
import keras
from keras.models import Sequential
from keras.layers import Dense,Activation,Dropout,Flatten,Conv2D,MaxPool2D
from keras.layers.normalization import BatchNormalization
import numpy as np
np.random.seed(1000)
Using TensorFlow backend.
# 獲取數據
import tflearn.datasets.oxflower17 as oxflower17
x,y = oxflower17.load_data(one_hot=True)
curses is not supported on this machine (please install/reinstall curses for an optimal experience)
WARNING:tensorflow:From C:\software\anaconda\lib\site-packages\tflearn\helpers\summarizer.py:9: The name tf.summary.merge is deprecated. Please use tf.compat.v1.summary.merge instead.

WARNING:tensorflow:From C:\software\anaconda\lib\site-packages\tflearn\helpers\trainer.py:25: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

WARNING:tensorflow:From C:\software\anaconda\lib\site-packages\tflearn\collections.py:13: The name tf.GraphKeys is deprecated. Please use tf.compat.v1.GraphKeys instead.

WARNING:tensorflow:From C:\software\anaconda\lib\site-packages\tflearn\config.py:123: The name tf.get_collection is deprecated. Please use tf.compat.v1.get_collection instead.

WARNING:tensorflow:From C:\software\anaconda\lib\site-packages\tflearn\config.py:129: The name tf.add_to_collection is deprecated. Please use tf.compat.v1.add_to_collection instead.

WARNING:tensorflow:From C:\software\anaconda\lib\site-packages\tflearn\config.py:131: The name tf.assign is deprecated. Please use tf.compat.v1.assign instead.
# 定義AlexNet模型
model = Sequential()
# 1block
model.add(Conv2D(filters = 97,
                 kernel_size = (11,11),
                 strides = (4,4),
                 padding = "valid",
                 input_shape = (224,224,3)))
model.add(Activation("relu"))
model.add(MaxPool2D(pool_size = (2,2),
                    strides = (2,2),
                    padding = "valid"))
model.add(BatchNormalization())
# 2block
model.add(Conv2D(filters = 256,
                 kernel_size = (11,11),
                 strides = (1,1),
                 padding = "valid"))
model.add(Activation("relu"))
model.add(MaxPool2D(pool_size = (2,2),
                    strides = (2,2),
                    padding = "valid"))
model.add(BatchNormalization())
# 3 block
model.add(Conv2D(filters = 384,
                 kernel_size = (3,3),
                 strides = (1,1),
                 padding = "valid"))
model.add(Activation("relu"))
model.add(BatchNormalization())
# 4 block
model.add(Conv2D(filters = 384,
                 kernel_size = (3,3),
                 strides = (1,1),
                 padding = "valid"))
model.add(Activation("relu"))
model.add(BatchNormalization())
# 5 block
model.add(Conv2D(filters = 256,
                 kernel_size = (3,3),
                 strides = (1,1),
                 padding = "valid"))
model.add(Activation("relu"))
model.add(MaxPool2D(pool_size = (2,2),
                    strides = (2,2),
                    padding = "valid"))
model.add(BatchNormalization())
# 6 dense
model.add(Flatten())
model.add(Dense(4096, input_shape=(224*224*3,)))
model.add(Activation("relu"))
model.add(Dropout(0.4))
model.add(BatchNormalization())
# 7 dense
model.add(Dense(4096))
model.add(Activation("relu"))
model.add(Dropout(0.4))
model.add(BatchNormalization())
# 8 dense
model.add(Dense(17))
model.add(Activation("softmax"))

WARNING:tensorflow:From C:\software\anaconda\lib\site-packages\keras\backend\tensorflow_backend.py:4070: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.
model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 54, 54, 97)        35308     
_________________________________________________________________
activation_1 (Activation)    (None, 54, 54, 97)        0         
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 27, 27, 97)        0         
_________________________________________________________________
batch_normalization_1 (Batch (None, 27, 27, 97)        388       
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 17, 17, 256)       3004928   
_________________________________________________________________
activation_2 (Activation)    (None, 17, 17, 256)       0         
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 8, 8, 256)         0         
_________________________________________________________________
batch_normalization_2 (Batch (None, 8, 8, 256)         1024      
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 6, 6, 384)         885120    
_________________________________________________________________
activation_3 (Activation)    (None, 6, 6, 384)         0         
_________________________________________________________________
batch_normalization_3 (Batch (None, 6, 6, 384)         1536      
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 4, 4, 384)         1327488   
_________________________________________________________________
activation_4 (Activation)    (None, 4, 4, 384)         0         
_________________________________________________________________
batch_normalization_4 (Batch (None, 4, 4, 384)         1536      
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 2, 2, 256)         884992    
_________________________________________________________________
activation_5 (Activation)    (None, 2, 2, 256)         0         
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 1, 1, 256)         0         
_________________________________________________________________
batch_normalization_5 (Batch (None, 1, 1, 256)         1024      
_________________________________________________________________
flatten_1 (Flatten)          (None, 256)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 4096)              1052672   
_________________________________________________________________
activation_6 (Activation)    (None, 4096)              0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 4096)              0         
_________________________________________________________________
batch_normalization_6 (Batch (None, 4096)              16384     
_________________________________________________________________
dense_2 (Dense)              (None, 4096)              16781312  
_________________________________________________________________
activation_7 (Activation)    (None, 4096)              0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 4096)              0         
_________________________________________________________________
batch_normalization_7 (Batch (None, 4096)              16384     
_________________________________________________________________
dense_3 (Dense)              (None, 17)                69649     
_________________________________________________________________
activation_8 (Activation)    (None, 17)                0         
=================================================================
Total params: 24,079,745
Trainable params: 24,060,607
Non-trainable params: 19,138
_________________________________________________________________
# compile
model.compile(loss = "categorical_crossentropy",
              optimizer = "adam",
              metrics = ["accuracy"])
# train
model.fit(x,
          y,
          batch_size = 32,
          epochs = 8,
          verbose = 1,
          validation_split = 0.3,
          shuffle = True)
WARNING:tensorflow:From C:\software\anaconda\lib\site-packages\keras\backend\tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

Train on 951 samples, validate on 409 samples
Epoch 1/8
951/951 [==============================] - 98s 103ms/step - loss: 4.1242 - accuracy: 0.2177 - val_loss: 74.8161 - val_accuracy: 0.0685
Epoch 2/8
951/951 [==============================] - 100s 105ms/step - loss: 2.7997 - accuracy: 0.3091 - val_loss: 12.2919 - val_accuracy: 0.1345
Epoch 3/8
951/951 [==============================] - 94s 99ms/step - loss: 2.3698 - accuracy: 0.3544 - val_loss: 7.1330 - val_accuracy: 0.1858
Epoch 4/8
951/951 [==============================] - 100s 105ms/step - loss: 2.1398 - accuracy: 0.4206 - val_loss: 3.2262 - val_accuracy: 0.2885
Epoch 5/8
951/951 [==============================] - 96s 101ms/step - loss: 2.0635 - accuracy: 0.4385 - val_loss: 2.7424 - val_accuracy: 0.3594
Epoch 6/8
951/951 [==============================] - 92s 96ms/step - loss: 1.9173 - accuracy: 0.4448 - val_loss: 2.6016 - val_accuracy: 0.3423
Epoch 7/8
951/951 [==============================] - 89s 94ms/step - loss: 1.9253 - accuracy: 0.4753 - val_loss: 3.7909 - val_accuracy: 0.3374
Epoch 8/8
951/951 [==============================] - 89s 94ms/step - loss: 1.5822 - accuracy: 0.5310 - val_loss: 3.0874 - val_accuracy: 0.3521





<keras.callbacks.callbacks.History at 0x257cb057a20>

四、 VGGNet的提出

VGGNet是牛津大學計算機視覺組和Google DeepMind公司的研究員一起研發的深度卷積神經網絡,並在2014年舉辦的ILSVRC中獲得了定位任務第一名和分類任務第二名的好成績。它繼承了AlexNet的思路,一樣由5個卷積層和3個全連接層組成。只不過在每個卷積層中進行2~4次連續卷積。

  • 在VGGNet中,卷積層使用的卷積核均爲3x3,步長爲1.卷積之後都要進行最大池化Max pooling,池化核爲2x2,步長爲2.
  • 所有的隱藏層都使用ReLU激活函數
  • 全連接層1,2使用Dropout來避免過擬合,概率爲0.5

多次重複使用同一大小的卷積核來提取更復雜和更具有表達性的特徵

https://arxiv.org/abs/1409.1556

01 VGGNet的模型結構

  • 卷積核採用3*3
  • VGGNet擁有5段卷積,每一段內有2~3個卷積層,同時每段尾部都會連接一共最大池化層用來縮小圖片尺寸。每段內的卷積核數量一樣,越靠後的段的卷積核數量越多:64-128-256-512-512.其中經常出現多個完全一樣的3*3的卷積層堆疊在一起的情況,這其實是非常有用的設計。
  • 兩個3x3的卷積層串聯相當於一個1個5x5的卷積層,即一個像素會跟周圍5*5的像素產生關聯,可以說感受野的大小爲5x5.
  • 三個3x3的卷積層串聯的效果則相當於1個7x7的卷積層。除此之外,3個串聯的3*3的卷積層,擁有比1個7x7的卷積層更少的參數量,只有後者的一半。最重要的是,3個3x3的卷積層擁有比1個7x7的卷積層更多的非線性變換(前者可以使用三次ReLU激活函數,而後者只有一次),使得CNN對特徵的學習能力更強。

02 VGGNet模型論文觀點

  • 1.用多層的卷積層組合配以小尺寸的濾波器,能實現大尺寸濾波器的感受野的同時,還能使參數數量更少;
  • 2.表示層深度的增加,能有效提升模型的性能;
  • 3.模型融合的性能優於單模型的性能;
  • 4.訓練期間分階段降低學習率有助模型收斂;

03 VGGNet-16的結構

04 VGGNet-16代碼實現

# 導包
import keras 
from keras.datasets import cifar10
from keras.models import Sequential
from keras.layers import Dense,Dropout,Activation,Flatten,Conv2D,MaxPooling2D
from keras.utils import to_categorical
from keras import optimizers
from keras.optimizers import SGD
# 導入數據
(x_train,y_train),(x_test,y_test) = cifar10.load_data()
x_train = x_train.astype("float32")
x_test = x_test.astype("float32")
y_train = to_categorical(y_train,10)
y_test = to_categorical(y_test,10)
Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
170500096/170498071 [==============================] - 29s 0us/step
# 定義VGG16模型
model = Sequential()

# block1
model.add(Conv2D(filters = 64,
                 kernel_size = (3,3),
                 activation = "relu",
                 padding = "same",
                 name = "block1_conv1",
                 input_shape = (32,32,3)))
model.add(Conv2D(filters = 64,
                 kernel_size = (3,3),
                 activation = "relu",
                 padding = "same",
                 name = "block1_conv2"))
model.add(MaxPooling2D(pool_size = (2,2),
                       strides = (2,2),
                       name = "block1_pool"))
# block2
model.add(Conv2D(filters = 128,
                 kernel_size = (3,3),
                 activation = "relu",
                 padding = "same",
                 name = "block2_conv1"))
model.add(Conv2D(filters = 128,
                 kernel_size = (3,3),
                 activation = "relu",
                 padding = "same",
                 name = "block2_conv2"))
model.add(MaxPooling2D(pool_size = (2,2),
                       strides = (2,2),
                       name = "block2_pool"))
# block3
model.add(Conv2D(filters = 256,
                 kernel_size = (3,3),
                 activation = "relu",
                 padding = "same",
                 name = "block3_conv1"))
model.add(Conv2D(filters = 256,
                 kernel_size = (3,3),
                 activation = "relu",
                 padding = "same",
                 name = "block3_conv2"))
model.add(Conv2D(filters = 256,
                 kernel_size = (3,3),
                 activation = "relu",
                 padding = "same",
                 name = "block3_conv3"))
model.add(MaxPooling2D(pool_size = (2,2),
                       strides = (2,2),
                       name = "block3_pool"))
# block4
model.add(Conv2D(filters = 512,
                 kernel_size = (3,3),
                 activation = "relu",
                 padding = "same",
                 name = "block4_conv1"))
model.add(Conv2D(filters = 512,
                 kernel_size = (3,3),
                 activation = "relu",
                 padding = "same",
                 name = "block4_conv2"))
model.add(Conv2D(filters = 512,
                 kernel_size = (3,3),
                 activation = "relu",
                 padding = "same",
                 name = "block4_conv3"))
model.add(MaxPooling2D(pool_size = (2,2),
                       strides = (2,2),
                       name = "block4_pool"))
# block5
model.add(Conv2D(filters = 512,
                 kernel_size = (3,3),
                 activation = "relu",
                 padding = "same",
                 name = "block5_conv1"))
model.add(Conv2D(filters = 512,
                 kernel_size = (3,3),
                 activation = "relu",
                 padding = "same",
                 name = "block5_conv2"))
model.add(Conv2D(filters = 512,
                 kernel_size = (3,3),
                 activation = "relu",
                 padding = "same",
                 name = "block5_conv3"))
model.add(MaxPooling2D(pool_size = (2,2),
                       strides = (2,2),
                       name = "block5_pool"))

model.add(Flatten())
model.add(Dense(4096,activation="relu",name="fc1"))
model.add(Dropout(0.5))
model.add(Dense(4096,activation="relu",name="fc2"))
model.add(Dropout(0.5))
model.add(Dense(10,activation="softmax",name="prediction"))
model.summary()
Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
block1_conv1 (Conv2D)        (None, 32, 32, 64)        1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 32, 32, 64)        36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 16, 16, 64)        0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 16, 16, 128)       73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 16, 16, 128)       147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 8, 8, 128)         0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 8, 8, 256)         295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 8, 8, 256)         590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 8, 8, 256)         590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 4, 4, 256)         0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 4, 4, 512)         1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 4, 4, 512)         2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 4, 4, 512)         2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 2, 2, 512)         0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 2, 2, 512)         2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 2, 2, 512)         2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 2, 2, 512)         2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 1, 1, 512)         0         
_________________________________________________________________
flatten_2 (Flatten)          (None, 512)               0         
_________________________________________________________________
fc1 (Dense)                  (None, 4096)              2101248   
_________________________________________________________________
dropout_3 (Dropout)          (None, 4096)              0         
_________________________________________________________________
fc2 (Dense)                  (None, 4096)              16781312  
_________________________________________________________________
dropout_4 (Dropout)          (None, 4096)              0         
_________________________________________________________________
prediction (Dense)           (None, 10)                40970     
=================================================================
Total params: 33,638,218
Trainable params: 33,638,218
Non-trainable params: 0
_________________________________________________________________
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss="categorical_crossentropy", 
              optimizer=sgd,
              metrics=['accuracy'])
model.fit(x_train,
          y_train,
          epochs=8, 
          batch_size=64,
          validation_split=0.3, 
          verbose=1)
WARNING:tensorflow:From C:\software\anaconda\lib\site-packages\keras\backend\tensorflow_backend.py:422: The name tf.global_variables is deprecated. Please use tf.compat.v1.global_variables instead.

Train on 35000 samples, validate on 15000 samples
Epoch 1/8
35000/35000 [==============================] - 1801s 51ms/step - loss: 2.2628 - accuracy: 0.1228 - val_loss: 2.0258 - val_accuracy: 0.1864
Epoch 2/8
35000/35000 [==============================] - 1737s 50ms/step - loss: 1.8117 - accuracy: 0.2843 - val_loss: 1.5979 - val_accuracy: 0.3885
Epoch 3/8
35000/35000 [==============================] - 1724s 49ms/step - loss: 1.4809 - accuracy: 0.4505 - val_loss: 1.3295 - val_accuracy: 0.5201
Epoch 4/8
35000/35000 [==============================] - 4062s 116ms/step - loss: 1.1633 - accuracy: 0.5833 - val_loss: 1.0005 - val_accuracy: 0.6498
Epoch 5/8
35000/35000 [==============================] - 1772s 51ms/step - loss: 0.9778 - accuracy: 0.6570 - val_loss: 0.9629 - val_accuracy: 0.6710
Epoch 6/8
35000/35000 [==============================] - 1832s 52ms/step - loss: 0.8202 - accuracy: 0.7206 - val_loss: 0.8424 - val_accuracy: 0.7109
Epoch 7/8
35000/35000 [==============================] - 1886s 54ms/step - loss: 0.6917 - accuracy: 0.7658 - val_loss: 0.8114 - val_accuracy: 0.7312
Epoch 8/8
35000/35000 [==============================] - 1760s 50ms/step - loss: 0.5947 - accuracy: 0.7989 - val_loss: 0.7796 - val_accuracy: 0.7450





<keras.callbacks.callbacks.History at 0x17e3bef6f28>

05 VGGNet-16封裝

from keras.applications import VGG16

model = VGG16(weights="imagenet",include_top=False)

model.summary()
Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         (None, None, None, 3)     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, None, None, 64)    1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, None, None, 64)    36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, None, None, 64)    0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, None, None, 128)   73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, None, None, 128)   147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, None, None, 128)   0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, None, None, 256)   295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, None, None, 256)   590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, None, None, 256)   590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, None, None, 256)   0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, None, None, 512)   1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, None, None, 512)   0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, None, None, 512)   2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, None, None, 512)   0         
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________

五、 GoogLeNet的提出

2014年,GoogLeNet和VGG是當年ImageNet挑戰賽(ILSVRC14)的雙雄,GoogLeNet獲得了第一名、VGG獲得了第二名,這兩類模型結構的共同點就是層次更深了。VGG繼承了LeNet以及AlexNet的一些框架架構,而GoogLeNet則做了更加大膽的網絡結構嘗試,雖然深度只有22層,但大小確比AlexNet和VGG小很多,GoogLeNet參數爲500萬個,AlexNet參數個數是GoogLeNet的12倍,VGGNet參數是AlexNet的3倍,因此在內存或計算資源有限時,GoogLeNet是比較好的選擇;從模型結果來看,GoogLeNet的性能更加優越。

[v1] Going Deeper withConvolutions, 6.67% test error,2014.9

論文地址:http://arxiv.org/abs/1409.4842

[v2] Batch Normalization:Accelerating Deep Network Training by Reducing Internal Covariate Shift, 4.8% test error,2015.2

論文地址:http://arxiv.org/abs/1502.03167

[v3] Rethinking theInception Architecture for Computer Vision, 3.5%test error,2015.12

論文地址:http://arxiv.org/abs/1512.00567

[v4] Inception-v4,Inception-ResNet and the Impact of Residual Connections on Learning, 3.08% test error,2016.2

論文地址:http://arxiv.org/abs/1602.07261

01 GoogLeNet的發展

Inception的作用就是替代了人工確定卷積層中過濾器的類型或者是否創建卷積層和池化層,讓網絡自己學習它具體需要什麼參數。

Inception V1

Inception V1中精心設計了Inception Module提高了參數的利用率;Inception V1去除了模型最後的全連接層,用全局平局池化層將圖片尺寸變爲1*1,在先前的網絡中,全連接層佔據了網絡的大部分參數,很容易產生過擬合現象。

Inception V2

Inception V2學習了VGGNet,用兩個3*3的卷積代替5*5的大卷積核(降低參數量的同時減輕了過擬合),同時還提出了Batch Normalization方法.BN是一個非常有效的正則化方法,可以讓大型卷積神經網絡的訓練速度加快很多倍,同時收斂後的分類準確率可以得到大幅度提高。

BN在用於神經網絡某層時,會對每一個mini-batch數據的內部進行標準化處理,使輸出規範化到(0,1)的正態分佈,減少了Internal Covariate Shift(內部神經元分佈的改變)。BN論文指出,傳統的深度神經網絡在訓練時,每一層的輸入的分佈都在變化,導致訓練變得困難,我們只能使用一個很小的學習速率解決這個問題。而對每一層使用BN之後,可以有效解決這個問題,學習速率可以增大很多倍,達到之前的準確率需要迭代的次數需要1/14,訓練時間大大縮短

當然,在使用BN時,需要一些調整:

  • 增大學習率並加快學習衰減速度以適應BN規範後的數據
  • 去除Dropout並減輕L2正則
  • 去除LRN
  • 更徹底地對訓練樣本進行shuffle
  • 減少數據增強過程中對數據的光學畸變

Inception v3

Inception V3主要在兩個方面改造:

  • 引入了Factorization into small convolutions的思想,將一個較大的二維卷積拆成兩個較小的一維卷積,比如將7*7卷積拆成1*7卷積核7*1卷積(下圖是3*3拆分成1*3和3*1的示意圖)。一方面節約了大量參數,加速運算並減去過擬合,同時增加了一層非線性擴展模型的表達能力。

  • 另一方面,Inception V3優化了Inception Module的結構,現在Inception Module有3535、1717和8*8三種不同的結構。

Inception v4

Inception V4相比V3主要是結合微軟的ResNet,發現ResNet的結構可以極大地加速訓練,同時性能也有提升。

02 Inception模塊實現

import keras
from keras.layers import Conv2D,MaxPooling2D,Input

input_img = Input(shape=(256,256,3))

tower_1 = Conv2D(64,(1,1),padding="same",activation="relu")(input_img)
tower_1 = Conv2D(64,(3,3),padding="same",activation="relu")(tower_1)

tower_2 = Conv2D(64,(1,1),padding="same",activation="relu")(input_img)
tower_2 = Conv2D(64,(5,5),padding="same",activation="relu")(tower_2)

tower_3 = MaxPooling2D((3,3),strides=(1,1),padding="same")(input_img)
tower_3 = Conv2D(64,(1,1),padding="same",activation="relu")(tower_3)

output = keras.layers.concatenate([tower_1,tower_2,tower_3],axis=1)

03 Inception V3調用

from keras.applications.inception_v3 import InceptionV3
from keras.preprocessing import image
from keras.models import Model
from keras.layers import Dense,GlobalAveragePooling2D
from keras import backend as K

# 構建不帶分類器的預訓練模型
base_model = InceptionV3(weights="imagenet",include_top=False)

# 添加全局平均池化層
x = base_model.output
x = GlobalAveragePooling2D()(x)

# 添加一個全連接層
x = Dense(1024,activation="relu")(x)

# 添加一個分類器,假設有100個類
predictions = Dense(100,activation="softmax")(x)

# 構建完整的模型
model = Model(input=base_model.input,outputs=predictions)

# 首先,我們只訓練頂部的幾層
# 鎖住所有的inception v3的卷積層
for layer in base_model.layers:
    layer.trainable = False
# 編譯模型
model.compile(optimizer = "rmsprop",
              loss = "categorical_crossentropy",
              metrics = ["accuracy"])
C:\software\anaconda\lib\site-packages\ipykernel_launcher.py:21: UserWarning: Update your `Model` call to the Keras 2 API: `Model(outputs=Tensor("de..., inputs=Tensor("in...)`
# 在新的數據集上訓練幾代
model.fit_generator()
# 現在頂層應該訓練好來,讓我們開始微調inception v3的卷積層
# 我們會鎖住底下的幾層,然後訓練其餘的頂層

# 讓我們看看每一層的名字和序號,看看應該鎖多少層
for i,layer in enumerate(base_model.layers):
    print(i,layer.name)
0 input_4
1 conv2d_105
2 batch_normalization_95
3 activation_95
4 conv2d_106
5 batch_normalization_96
6 activation_96
7 conv2d_107
8 batch_normalization_97
9 activation_97
10 max_pooling2d_7
11 conv2d_108
12 batch_normalization_98
13 activation_98
14 conv2d_109
15 batch_normalization_99
16 activation_99
17 max_pooling2d_8
18 conv2d_113
19 batch_normalization_103
20 activation_103
21 conv2d_111
22 conv2d_114
23 batch_normalization_101
24 batch_normalization_104
25 activation_101
26 activation_104
27 average_pooling2d_10
28 conv2d_110
29 conv2d_112
30 conv2d_115
31 conv2d_116
32 batch_normalization_100
33 batch_normalization_102
34 batch_normalization_105
35 batch_normalization_106
36 activation_100
37 activation_102
38 activation_105
39 activation_106
40 mixed0
41 conv2d_120
42 batch_normalization_110
43 activation_110
44 conv2d_118
45 conv2d_121
46 batch_normalization_108
47 batch_normalization_111
48 activation_108
49 activation_111
50 average_pooling2d_11
51 conv2d_117
52 conv2d_119
53 conv2d_122

80 batch_normalization_119
81 batch_normalization_120
82 activation_114
83 activation_116
84 activation_119


300 batch_normalization_180
301 activation_182
302 activation_183
303 activation_186
304 activation_187
305 batch_normalization_188
306 activation_180
307 mixed9_1
308 concatenate_5
309 activation_188
310 mixed10
# 我們選擇訓練最上面的兩個Inception block
# 也就是鎖住前面的249層,然後放開之後的層
for layer in model.layers[:249]:
    layer.trainable = False
for layer in model.layers[249:]:
    layer.trainable = True

# 我們需要重新編譯模型,才能使上面的修改生效
from keras.optimizers import SGD
model.compile(optimizer=SGD(lr=0.0001,momentum=0.9),loss="categorical_crossentropy",metrics=["accuracy"])
# 繼續訓練模型,這次訓練最後兩個Inception block和兩個全連接層
model.fit_generator()

六、 ResNet的提出

MSRA(微軟亞洲研究院)何凱明團隊的深度殘差網絡(Deep Residual Network)在2015年的ImageNet上取得冠軍,該網絡簡稱爲ResNet,層數達到了152層,top-5錯誤率降到了3.57.

https://arxiv.org/pdf/1512.03385.pdf

ResNet指出,在許多的數據上都顯示出一個普遍現象:增加網絡深度到一定程度時,更深的網絡意味着更高的訓練誤差。

誤差升高的原因是網絡越深,梯度消失的現象就越明顯,所以在反向傳播的時候,無法有效的把梯度更新到前面的網絡層,靠前的網絡層參數無法更新,導致訓練和測試效果變差。

ResNet面臨的問題就是怎麼樣在增加網絡深度的情況下可以有效解決梯度消失的問題。

01 殘差網絡

ResNet中解決深層網絡梯度消失的問題的核心結構是殘差網絡

殘差網絡增加了一個identity mapping(恆等映射),把當前輸出直接傳輸給下一層網絡(全部是1:1傳輸,不增加額外的參數),相當於走了一個捷徑,跳過了本層運算,這個直接連接命名爲"skip connection"。同時在反向傳播過程中,也是將下一層網絡的梯度直接傳遞給上一層網絡,這樣就解決了深層網絡的梯度消失問題。

02 殘差結構的實現

import keras
from keras.layers import Conv2D,Input

x = Input(shape=(224,224,3))
y = Conv2D(3,(3,3),padding="same")(x)

z = keras.layers.add([x,y])

03 ResNet實現

from keras.applications.resnet50 import ResNet50
from keras.preprocessing import image
from keras.applications.resnet50 import preprocess_input, decode_predictions
import numpy as np

model = ResNet50(weights="imagenet")

img_path = "elephant.jpg"
img = image.load_img(img_path, target_size=(224, 224))
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
x = preprocess_input(x)

preds = model.predict(x)
# decode the results into a list of tuples (class, description, probability)
# (one such list for each sample in the batch)
print("Predicted:", decode_predictions(preds, top=3)[0])

Predicted: [('n02504458', 'African_elephant', 0.4616896), ('n01871265', 'tusker', 0.38391447), ('n02504013', 'Indian_elephant', 0.15430869)]

七、 DenseNet的提出

論文:Densely Connected Convolutional Networks

論文鏈接:https://arxiv.org/pdf/1608.06993.pdf

在本篇論文中提出了一種不同於ResNet的深層卷積神經網絡,稱爲DenseNet。

爲了在數據集上獲得比較好的效果,主要的做法有兩種:

  • 設計更寬的網絡:代表:GoogLeNet,FractalNets
  • 設計更深的網絡:代表:HighwayNet,ResNet

而本論文的作者則是從feature入手,通過對feature的極致利用達到更好的效果和更少的參數。這個設計思想是feature reuse

01 DenseNet設計思想

DenseNet的設計思想是feature reuse,也就是特徵再利用。

這個圖很直觀明瞭展示了DenseNet feature reuse的設計思想。這是一個dense block,是DenseNet中的一個子模塊,每一層的輸入都包含了前面所有的層(通過concatenate的方式)。

DenseNet的幾個優點:

  • 減輕了vanishing-gradient(梯度消失)
  • 加強了feature的傳遞
  • 更加有效的利用了feature
  • 一定程度上減少了參數數量。

02 DenseNet的實現

Conv_block:

卷積操作,按照論文的說法,這裏應該是一個組合函數,分別爲:BatchNormalization、ReLU和3*3Conv

import keras
from keras.layers import *
from keras.models import *
from keras import backend as K
from keras.regularizers import l2
Using TensorFlow backend.
def conv_block(ip, nb_filter, bottleneck=False, dropout_rate=None, weight_decay=1e-4):
    ''' Apply BatchNorm, Relu, 3x3 Conv2D, optional bottleneck block and dropout
        Args:
            ip: Input keras tensor
            nb_filter: number of filters
            bottleneck: add bottleneck block
            dropout_rate: dropout rate
            weight_decay: weight decay factor
        Returns: keras tensor with batch_norm, relu and convolution2d added (optional bottleneck)
    '''
    concat_axis = 1 if K.image_data_format() == 'channel_first' else -1

    x = BatchNormalization(axis=concat_axis, epsilon=1.1e-5)(ip)
    x = Activation('relu')(x)

    if bottleneck:
        inter_channel = nb_filter * 4
        x = Conv2D(inter_channel, (1, 1), kernel_initializer='he_normal', padding='same', use_bias=False,
                   kernel_regularizer=l2(weight_decay))(x)
        x = BatchNormalization(axis=concat_axis, epsilon=1.1e-5)(x)
        x = Activation('relu')(x)

    x = Conv2D(nb_filter, (3, 3), kernel_initializer='he_normal', padding='same', use_bias=False)(x)

    if dropout_rate:
        x = Dropout(dropout_rate)(x)

    return x

其中的concat_axis表示特徵軸,因爲連接和BN都是對特徵軸而言的。bottleneck表示是否使用瓶頸層,也就是使用1*1的卷積層將特徵圖的通道數進行壓縮。

Transition_block

過渡層,用來連接兩個dense_block.同時在最後一個dense_block的尾部不需要使用過渡層。按照論文的說法,過渡層由四部分組成:BatchNormalization、ReLU、11Conv和22Maxpooling。

def transition_block(ip, nb_filter, compression=1.0, weight_decay=1e-4):
    '''Apply BatchNorm, ReLU, Conv2d, optional compressoin, dropout and Maxpooling2D
        Args:
            ip: keras tensor
            nb_filter: number of filters
            compression: caculated as 1 - reduction. Reduces the number of features maps in the transition block
            dropout_rate: dropout rate
            weight_decay: weight decay factor
        Returns:
            keras tensor, after applying batch_norm, relu-conv, dropout, maxpool
    '''
    concat_axis = 1 if K.image_data_format() == 'channels_first' else -1

    x = BatchNormalization(axis=concat_axis, epsilon=1.1e-5)(ip)
    x = Activation('relu')(x)
    x = Conv2D(int(nb_filter * compression), (1, 1), kernel_initializer='he_normal', padding='same', use_bias=False,
               kernel_regularizer=l2(weight_decay))(x)
    x = AveragePooling2D((2, 2), strides=(2, 2))(x)

    return x

其中的Conv2D操作實現了1x1的卷積操作,同時使用了compression_rate,也就是論文中說的壓縮率,將通道數進行調整。

Dense_block:

此處使用循環實現了dense_block的密集連接。

def dense_block(x, nb_layers, nb_filter, growth_rate, bottleneck=False, dropout_rate=None, weight_decay=1e-4,
                grow_nb_filters=True, return_concat_list=False):
    '''Build a dense_block where the output of ench conv_block is fed t subsequent ones
        Args:
            x: keras tensor
            nb_layser: the number of layers of conv_block to append to the model
            nb_filter: number of filters
            growth_rate: growth rate
            bottleneck: bottleneck block
            dropout_rate: dropout rate
            weight_decay: weight decay factor
            grow_nb_filters: flag to decide to allow number of filters to grow
            return_concat_list: return the list of feature maps along with the actual output
        Returns:
            keras tensor with nb_layers of conv_block appened
    '''

    concat_axis = 1 if K.image_data_format() == 'channels_first' else -1

    x_list = [x]

    for i in range(nb_layers):
        cb = conv_block(x, growth_rate, bottleneck, dropout_rate, weight_decay)
        x_list.append(cb)
        x = concatenate([x, cb], axis=concat_axis)

        if grow_nb_filters:
            nb_filter += growth_rate

    if return_concat_list:
        return x, nb_filter, x_list
    else:
        return x, nb_filter

其中的x = concatenate([x, cb], axis=concat_axis)操作使得x在每次循環中始終維護一個全局狀態,第一次循環輸入爲x,輸出爲cb1,第二的輸入爲cb=[x, cb1],輸出爲cb2,第三次的輸入爲cb=[x, cb1, cb2],輸出爲cb3,以此類推。增長率growth_rate其實就是每次卷積時使用的卷積核個數,也就是最後輸出的通道數。

Create_dense_net:

構建網絡模型:

def create_dense_net(nb_classes, img_input, include_top, depth=40, nb_dense_block=3, growth_rate=12, nb_filter=-1,
                     nb_layers_per_block=[1], bottleneck=False, reduction=0.0, dropout_rate=None, weight_decay=1e-4,
                     subsample_initial_block=False, activation='softmax'):
    ''' Build the DenseNet model
        Args:
            nb_classes: number of classes
            img_input: tuple of shape (channels, rows, columns) or (rows, columns, channels)
            include_top: flag to include the final Dense layer
            depth: number or layers
            nb_dense_block: number of dense blocks to add to end (generally = 3)
            growth_rate: number of filters to add per dense block
            nb_filter: initial number of filters. Default -1 indicates initial number of filters is 2 * growth_rate
            nb_layers_per_block: list, number of layers in each dense block
            bottleneck: add bottleneck blocks
            reduction: reduction factor of transition blocks. Note : reduction value is inverted to compute compression
            dropout_rate: dropout rate
            weight_decay: weight decay rate
            subsample_initial_block: Set to True to subsample the initial convolution and
                    add a MaxPool2D before the dense blocks are added.
            subsample_initial:
            activation: Type of activation at the top layer. Can be one of 'softmax' or 'sigmoid'.
                    Note that if sigmoid is used, classes must be 1.
        Returns: keras tensor with nb_layers of conv_block appended
    '''

    concat_axis = 1 if K.image_data_format() == 'channel_first' else -1

    if type(nb_layers_per_block) is not list:
        print('nb_layers_per_block should be a list!!!')
        return 0

    final_nb_layer = nb_layers_per_block[-1]
    nb_layers = nb_layers_per_block[:-1]

    if nb_filter <= 0:
        nb_filter = 2 * growth_rate
    compression = 1.0 - reduction
    if subsample_initial_block:
        initial_kernel = (7, 7)
        initial_strides = (2, 2)
    else:
        initial_kernel = (3, 3)
        initial_strides = (1, 1)

    x = Conv2D(nb_filter, initial_kernel, kernel_initializer='he_normal', padding='same',
               strides=initial_strides, use_bias=False, kernel_regularizer=l2(weight_decay))(img_input)
    if subsample_initial_block:
        x = BatchNormalization(axis=concat_axis, epsilon=1.1e-5)(x)
        x = Activation('relu')(x)
        x = MaxPooling2D((3, 3), strides=(2, 2), padding='same')(x)

    for block_index in range(nb_dense_block - 1):
        x, nb_filter = dense_block(x, nb_layers[block_index], nb_filter, growth_rate, bottleneck=bottleneck,
                                   dropout_rate=dropout_rate, weight_decay=weight_decay)
        x = transition_block(x, nb_filter, compression=compression, weight_decay=weight_decay)
        nb_filter = int(nb_filter * compression)

    # 最後一個block沒有transition_block
    x, nb_filter = dense_block(x, final_nb_layer, nb_filter, growth_rate, bottleneck=bottleneck,
                               dropout_rate=dropout_rate, weight_decay=weight_decay)

    x = BatchNormalization(axis=concat_axis, epsilon=1.1e-5)(x)
    x = Activation('relu')(x)
    x = GlobalAveragePooling2D()(x)

    if include_top:
        x = Dense(nb_classes, activation=activation)(x)

    return x

生成Model

input_shape = (224,224,3)
inputs = Input(shape=input_shape)
x = create_dense_net(nb_classes=1000, img_input=inputs, include_top=True, depth=169, nb_dense_block=4,
                     growth_rate=32, nb_filter=64, nb_layers_per_block=[6, 12, 32, 32], bottleneck=True, reduction=0.5,
                     dropout_rate=0.0, weight_decay=1e-4, subsample_initial_block=True, activation='softmax')
model = Model(inputs, x, name='densenet169')

print(model.summary())
WARNING:tensorflow:From C:\software\anaconda\lib\site-packages\keras\backend\tensorflow_backend.py:4070: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.

WARNING:tensorflow:From C:\software\anaconda\lib\site-packages\keras\backend\tensorflow_backend.py:4074: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.

Model: "densenet169"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 224, 224, 3)  0                                            
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None, 112, 112, 64) 9408        input_1[0][0]                    
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 112, 112, 64) 256         conv2d_1[0][0]                   
__________________________________________________________________________________________________
activation_1 (Activation)       (None, 112, 112, 64) 0           batch_normalization_1[0][0]      
__________________________________________________________________________________________________
max_pooling2d_1 (MaxPooling2D)  (None, 56, 56, 64)   0           activation_1[0][0]               
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 56, 56, 64)   256         max_pooling2d_1[0][0]            
__________________________________________________________________________________________________
activation_2 (Activation)       (None, 56, 56, 64)   0           batch_normalization_2[0][0]      
__________________________________________________________________________________________________
conv2d_2 (Conv2D)               (None, 56, 56, 128)  8192        activation_2[0][0]               

batch_normalization_169 (BatchN (None, 7, 7, 1664)   6656        concatenate_82[0][0]             
__________________________________________________________________________________________________
activation_169 (Activation)     (None, 7, 7, 1664)   0           batch_normalization_169[0][0]    
__________________________________________________________________________________________________
global_average_pooling2d_1 (Glo (None, 1664)         0           activation_169[0][0]             
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 1000)         1665000     global_average_pooling2d_1[0][0] 
==================================================================================================
Total params: 14,307,880
Trainable params: 14,149,480
Non-trainable params: 158,400
__________________________________________________________________________________________________
None

03 Keras遷移DenseNet

from keras.applications.densenet import DenseNet121
model = DenseNet121(include_top=True, weights='imagenet', input_tensor=None, input_shape=None, pooling=None, classes=1000)
Using TensorFlow backend.


WARNING:tensorflow:From C:\software\anaconda\lib\site-packages\keras\backend\tensorflow_backend.py:4070: The name tf.nn.max_pool is deprecated. Please use tf.nn.max_pool2d instead.

WARNING:tensorflow:From C:\software\anaconda\lib\site-packages\keras\backend\tensorflow_backend.py:4074: The name tf.nn.avg_pool is deprecated. Please use tf.nn.avg_pool2d instead.
model.summary()
Model: "densenet121"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_1 (InputLayer)            (None, 224, 224, 3)  0                                            
__________________________________________________________________________________________________
zero_padding2d_1 (ZeroPadding2D (None, 230, 230, 3)  0           input_1[0][0]                    
__________________________________________________________________________________________________
conv1/conv (Conv2D)             (None, 112, 112, 64) 9408        zero_padding2d_1[0][0]           
__________________________________________________________________________________________________
conv1/bn (BatchNormalization)   (None, 112, 112, 64) 256         conv1/conv[0][0]                 
__________________________________________________________________________________________________
conv1/relu (Activation)         (None, 112, 112, 64) 0           conv1/bn[0][0]                   
__________________________________________________________________________________________________
zero_padding2d_2 (ZeroPadding2D (None, 114, 114, 64) 0           conv1/relu[0][0]                 
__________________________________________________________________________________________________
pool1 (MaxPooling2D)            (None, 56, 56, 64)   0           zero_padding2d_2[0][0]           
__________________________________________________________________________________________________
conv2_block1_0_bn (BatchNormali (None, 56, 56, 64)   256         pool1[0][0]                      
__________________________________________________________________________________________________
conv2_block1_0_relu (Activation (None, 56, 56, 64)   0           conv2_block1_0_bn[0][0]          
__________________________________________________________________________________________________
conv2_block1_1_conv (Conv2D)    (None, 56, 56, 128)  8192        conv2_block1_0_relu[0][0]        

____________________________________________________________________________
bn (BatchNormalization)         (None, 7, 7, 1024)   4096        conv5_block16_concat[0][0]       
__________________________________________________________________________________________________
relu (Activation)               (None, 7, 7, 1024)   0           bn[0][0]                         
__________________________________________________________________________________________________
avg_pool (GlobalAveragePooling2 (None, 1024)         0           relu[0][0]                       
__________________________________________________________________________________________________
fc1000 (Dense)                  (None, 1000)         1025000     avg_pool[0][0]                   
==================================================================================================
Total params: 8,062,504
Trainable params: 7,978,856
Non-trainable params: 83,648
__________________________________________________________________________________________________
for i,layer in enumerate(model.layers):
    print(i,layer)
0 <keras.engine.input_layer.InputLayer object at 0x000001E28E156E48>
1 <keras.layers.convolutional.ZeroPadding2D object at 0x000001E28E17C8D0>
2 <keras.layers.convolutional.Conv2D object at 0x000001E295CC14A8>
3 <keras.layers.normalization.BatchNormalization object at 0x000001E295CCC860>
4 <keras.layers.core.Activation object at 0x000001E295CCCD30>

180 <keras.layers.core.Activation object at 0x000001E2995C2390>
181 <keras.layers.convolutional.Conv2D object at 0x000001E299602908>
182 <keras.layers.merge.Concatenate object at 0x000001E29963DDD8>

八、 SENet的提出

論文:Squeeze-and-Excitation Networks

論文鏈接:https://arxiv.org/abs/1709.01507

Squeeze-and-Excitation Networks(簡稱SENet)是Momenta胡杰團隊提出的新的網絡結構,贏的最後一屆ImageNet2017競賽Image Classification任務多的冠軍。

SENet是希望顯式地建模特徵通道之間的相互依賴關係,並未引入新的空間維度來進行特徵通道間的融合,而是採用了一種全新的【特徵重標定】的策略。具體來說,就是通過學習的方法來自動獲取到每個特徵通道的重要程度,然後依照這個重要程度去提升有用的特徵並去抑制對當前任務用處不大的特徵。
即引入了channel-wise的attention機制
attention機制的目標:着重更有利於任務的信息,抑制無用/干擾/冗餘信息。

01 SENet設計思想

上面是SE模塊的示意圖。給定一個輸入x,其特徵通道數爲c_1,通過一系列卷積等一般變換後得到一個特徵通道數爲c_2的特徵。與傳統的CNN不一樣的是,接下來主要是通過三個操作來重標定前面得到的特徵。

  • 首先是Squeeze操作,順着空間維度來進行特徵壓縮,將每個二維的特徵通道變成一個一維的實數。這個實數在某種程度上具有全局的感受野,並且輸出的維度和輸入的特徵通道數相匹配,它表徵着在特徵通道上響應的全局分佈,而且使得靠近輸入的層也可以獲得全局的感受野,這一點在很多任務中都是非常有用的。
  • 其次是Exciation操作,它是一個類似於循環神經網絡中的門的機制。通過參數w來爲每個特徵通道生成權重,其中參數w被學習用來顯式地建模特徵通道間的相關性。
  • 最後是一個Reweight的操作,將Excitation的輸出的權重看做是經過特徵選擇後的每個特徵通道的重要性,然後通過乘法逐通道加權到先前的特徵上,完成在通道維度上的對原始特徵的重標定。

上圖是將SE模塊嵌入到Inception結構的一個示例。方框旁邊的維度信息代表改成的輸出。這裏我們使用global average pooling作爲Squeeze操作。緊跟着兩個Fully Connected層組成一個Bottleneck結構去建模通道間的相關性,並輸出和輸入特徵同樣數目的通道。

首先將特徵維度降低到輸入的1/16,然後經過ReLu激活函數後再通過一個Fully Connected層升回到原來的維度。這樣做比直接用一個Fully Connected層的好處在於:a.具有更多的非線性,可以更好的擬合通道間複雜的相關性;b.極大地減少了參數量和計算量。然後通過一個Sigmoid的門獲得0~1之間歸一化的權重,最後通過一個Scale的操作來將歸一化後的權重加權到每個通道的特徵上。

SE模塊還可以嵌入到含有skip-connections的模塊中。上右圖是將SE嵌入到ResNet模塊中的一個例子,操作過程基本和SE-Inception一樣,只不過是在Addition前對分支上Residual的特徵進行了特徵重標定。如果對Addition後主支上的特徵進行重標定,由於在主幹上存在0~1的scale操作,在網絡較深BP優化時就會在靠近輸入層容易出現梯度彌散的情況,導致模型難以優化。

目前大多數的主流網絡都是基於這兩種類似的單元通過repeat方式疊加來構造的。由此可見,SE模塊可以嵌入到現在幾乎所有的網絡結構中。通過在原始網絡結構的building block單元中嵌入SE模塊,我們可以獲得不同種類的SENet。如SE-BN-Inception、SE-ResNet、SE-ReNeXt、SE-Inception-ResNet-v2等。

02 SENet代碼

# squeeze module的實現

squeeze = GlobalAveragePooling2D()(x)

excitation = Dense(units=out_dim//self.ratio)(squeeze)
excitation = self.activation(excitation)
excitation = Dense(units=out_dim)(excitation)
excitation = self.activation(excitation,"sigmoid")
excitation = Reshape((1,1,out_dim))(excitation)

scale = multiply([x,excitation])

# SE-Inception Module實現
def build_model(output_dims,input_shape=(224,224,3)):
    inputs_dim = Input(input_shape)
    x = Lambda(lambda x: x / 255.0)(inputs_dim) # 歸一化處理
    
    x = InceptionV3(include_top=False,
                               weights="imagenet",
                               input_tensor=None,
                               input_shape=(224,224,3),
                               pooling=max)(x)
    
    squeeze = GlobalAveragePooling2D()(x)
    
    excitation = Dense(units=2048//11)(squeeze)
    excitation = Activation("relu")(excitation)
    excitation  = Dense(units=2048)(excitation)
    excitation = Activation("sigmoid")(excitation)
    excitation = Reshape((1,1,2048))(excitation)
    
    scale = multiply([x,excitation])
    
    x = GlobalAveragePooling2D()(scale)
    dp_1 = Dropout(0.6)(x)
    fc2 = Dense(out_dims)(dp_1)
    fc2 = Activation("sigmoid")(fc2)
    model = Model(inputs=inputs_dim,outputs=fc2)
    return model

九、 CBAM的提出

論文:CBAM:Convolutional Block Attention Module

論文鏈接:https://arxiv.org/abs/1807.06521

這是ECCV2018的一篇文章,主要貢獻爲提出一個新的網絡結構-簡單但有效的注意力模塊CBAM。給定一箇中間特徵圖,我們分別沿着空間和通道兩個維度依次推斷出注意力權重,然後與原特徵圖相乘來對特徵進行自適應調整。

在SENet論文中,提出了在feature map的通道上進行attention生成,然後與原來的feature map相乘。而在CBAM中,將attention同時運用到channelspatial兩個維度上,CBAM與SE Module一樣,可以嵌入到目前大部分主流網絡中,在不顯著增加計算量和參數量的前提下能提升網絡模型的特徵提取能力。

此外本文提出了影響卷積神經網絡的幾大因素:

  • Depth:VGG,ResNet
  • Width:GoogleNet
  • Cardinality:Xception,ResNeXt
  • Attention:channel attention,spatial attention

01 CBAM的設計思想

總之,CBAM在channel和spatial兩個維度上引入了attention機制。

Channel attention module

將輸入的feature map,分別經過基於width和height的global max pooling和global average pooling,然後分別經過MLP。將MLP輸出的特徵進行基於element wise的加和操作,最後再經過sigmoid激活操作,生成最終的channel attention feature map。將該channel attention feature map和input feature map做element wise乘法操作,生成spatial attention模塊需要的輸入特徵。

Spatial attention module

將channel attention模塊輸出的特徵圖作爲本模塊的輸入特徵圖。首先做一個基於channel的global max pooling和global average pooling,然後將這兩個結果基於channel做concat操作。然後經過一個卷積操作,降維爲1個channel。再經過sigmoid生成spatial attention feature。最後將該feature和該模塊的輸入feature做乘法,得到最終生成的特徵。

02 CBAM模塊的實現

def CBAM(input,reduction):
    """
    @Convolutional Block Attention Module
    """
    _, width, height, channel = input.get_shape()  # (B,W,H,C)
    
    # channel attention
    x_mean = tf.reduce_mean(input,axis=(1,2).keepdims=True)   # (B,1,1,C)
    x_mean = tf.layers.conv2d(x_mean, channel // reduction, 1, activation=tf.nn.relu, name="CA1") # (B,1,1,C/r)
    x_mean = tf.layers.conv2d(x_mean, channel, 1, name="CV2") # (B,1,1,C)
    
    x_max = tf.reduce_max(input,axis=(1,2),keepdims=True) # (B,1,1,C)
    x_max = tf..layers.conv2d(x_max,channel//reduction,1,activation=tf.nn.relu,name="CA1",reuse=True) #(B,1,1,C//r)
    x_max = tf.layers.conv2d(x_max,channel,1,name="CA2",reuse=True) # (B,1,1,C) # reuse:Boolean類型,表示是否可以重複使用具有相同名字的前一層的權重。
    
    x = tf.add(x_mean,x_max) # (B,1,1,C)
    x = tf.nn.sigmoid(x) # (B,1,1,C)
    x= tf.multiply(input,x) #(B,W,H,C)
    
    # spatial attention
    y_mean = tf.reduce_mean(x,axis=3,keepdims=True) # (B,W,H,1)
    y_max = tf.reduce_max(x,axis=3,keepdims=True) # (B,W,H,1)
    y = tf.concat([y_mean,y_max],axis=-1) # (B,W,H,2)
    y = tf.layers.conv2d(y,1,7,padding="same",activation=tf.nn.sigmoid) # (B,W,H,1)
    y = tf.multiply(x,y) # (B,W,H,C)
    
    return y
from keras.layers import GlobalAveragePooling2D, GlobalMaxPooling2D, Reshape, Dense, multiply, Permute, Concatenate, Conv2D, Add, Activation, Lambda
from keras import backend as K
from keras.activations import sigmoid

def attach_attention_module(net, attention_module):
    if attention_module == 'se_block': # SE_block
        net = se_block(net)
    elif attention_module == 'cbam_block': # CBAM_block
        net = cbam_block(net)
    else:
        raise Exception("'{}' is not supported attention module!".format(attention_module))
    
    return net

def se_block(input_feature, ratio=8):
    """Contains the implementation of Squeeze-and-Excitation(SE) block.
    As described in https://arxiv.org/abs/1709.01507.
    """

    channel_axis = 1 if K.image_data_format() == "channels_first" else -1
    channel = input_feature._keras_shape[channel_axis]

    se_feature = GlobalAveragePooling2D()(input_feature)
    se_feature = Reshape((1, 1, channel))(se_feature)
    assert se_feature._keras_shape[1:] == (1,1,channel)
    se_feature = Dense(channel // ratio,
                       activation='relu',
                       kernel_initializer='he_normal',
                       use_bias=True,
                       bias_initializer='zeros')(se_feature)
    assert se_feature._keras_shape[1:] == (1,1,channel//ratio)
    se_feature = Dense(channel,
                       activation='sigmoid',
                       kernel_initializer='he_normal',
                       use_bias=True,
                       bias_initializer='zeros')(se_feature)
    assert se_feature._keras_shape[1:] == (1,1,channel)
    if K.image_data_format() == 'channels_first':
        se_feature = Permute((3, 1, 2))(se_feature)

    se_feature = multiply([input_feature, se_feature])
    return se_feature

def cbam_block(cbam_feature, ratio=8):
    """Contains the implementation of Convolutional Block Attention Module(CBAM) block.
    As described in https://arxiv.org/abs/1807.06521.
    """

    cbam_feature = channel_attention(cbam_feature, ratio)
    cbam_feature = spatial_attention(cbam_feature)
    return cbam_feature

def channel_attention(input_feature, ratio=8):

    channel_axis = 1 if K.image_data_format() == "channels_first" else -1
    channel = input_feature._keras_shape[channel_axis]

    shared_layer_one = Dense(channel//ratio,
                             activation='relu',
                             kernel_initializer='he_normal',
                             use_bias=True,
                             bias_initializer='zeros')
    shared_layer_two = Dense(channel,
                             kernel_initializer='he_normal',
                             use_bias=True,
                             bias_initializer='zeros')

    avg_pool = GlobalAveragePooling2D()(input_feature)    
    avg_pool = Reshape((1,1,channel))(avg_pool)
    assert avg_pool._keras_shape[1:] == (1,1,channel)
    avg_pool = shared_layer_one(avg_pool)
    assert avg_pool._keras_shape[1:] == (1,1,channel//ratio)
    avg_pool = shared_layer_two(avg_pool)
    assert avg_pool._keras_shape[1:] == (1,1,channel)

    max_pool = GlobalMaxPooling2D()(input_feature)
    max_pool = Reshape((1,1,channel))(max_pool)
    assert max_pool._keras_shape[1:] == (1,1,channel)
    max_pool = shared_layer_one(max_pool)
    assert max_pool._keras_shape[1:] == (1,1,channel//ratio)
    max_pool = shared_layer_two(max_pool)
    assert max_pool._keras_shape[1:] == (1,1,channel)

    cbam_feature = Add()([avg_pool,max_pool])
    cbam_feature = Activation('sigmoid')(cbam_feature)

    if K.image_data_format() == "channels_first":
        cbam_feature = Permute((3, 1, 2))(cbam_feature)

    return multiply([input_feature, cbam_feature])

def spatial_attention(input_feature):
    kernel_size = 7

    if K.image_data_format() == "channels_first":
        channel = input_feature._keras_shape[1]
        cbam_feature = Permute((2,3,1))(input_feature)
    else:
        channel = input_feature._keras_shape[-1]
        cbam_feature = input_feature

    avg_pool = Lambda(lambda x: K.mean(x, axis=3, keepdims=True))(cbam_feature)
    assert avg_pool._keras_shape[-1] == 1
    max_pool = Lambda(lambda x: K.max(x, axis=3, keepdims=True))(cbam_feature)
    assert max_pool._keras_shape[-1] == 1
    concat = Concatenate(axis=3)([avg_pool, max_pool])
    assert concat._keras_shape[-1] == 2
    cbam_feature = Conv2D(filters = 1,
                          kernel_size=kernel_size,
                          strides=1,
                          padding='same',
                          activation='sigmoid',
                          kernel_initializer='he_normal',
                          use_bias=False)(concat)	
    assert cbam_feature._keras_shape[-1] == 1

    if K.image_data_format() == "channels_first":
        cbam_feature = Permute((3, 1, 2))(cbam_feature)
        
    return multiply([input_feature, cbam_feature])
Using TensorFlow backend.

具體案例見:https://github.com/CodingChaozhang/CBAM-keras

背景知識

AutoML是指儘量不通過人爲來設定超參數,而是使用某種學習機制,來調節這些超參數。這些學習機制包括傳統的貝葉斯優化,多臂老虎機,進化算法,還有比較新的強化學習。

AutoML可以分爲傳統AutoML,自動調節傳統的機器學習算法的參數,比如隨機森林,我們來調節它的max_depth.num_trees等參數。還有一類AutoML,則專注深度學習,這類AutoML,不妨稱之爲深度AutoML

與傳統的AutoML的差別是,現階段深度AutoML,會將神經網絡的超參數分爲兩類,一類是與訓練有關的超參數,比如learning rate,regularization,momentum等;還有一類超參數,則可以總結爲網絡結構。對網絡結構的超參數自動調節,也叫Neural architecture search(nas).而針對訓練的超參數,也是傳統的AutoML的自動調節,稱之爲Hyperparameter optimization(ho)

十、 EfficientNet的提出

論文:EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

論文鏈接:https://arxiv.org/abs/1905.11946

傳統的神經模型通常是任意增加CNN的深度或寬度,或使用更大的輸入圖像分辨率進行訓練和評估。雖然這些方法確實提高了準確性,但它們通常需要長時間的手動調優,並且仍然會產生次優的性能。而本文提出了一種新的模型縮放方法,採用一系列的尺寸縮放係數來統一縮放網絡維度。通過使用這種新穎的縮放方法和AutoMl技術,稱這種模型爲EfficientNets,它具有更高達10倍的效果(更小、更快)。

01 EfficentNet設計思想

本文提出了一種多維度混合的模型放縮方法。本文希望找到一個同時兼顧速度和精度的模型放縮方法。爲此,作者重新審視了前人提出的模型放縮的幾個維度:網絡深度網絡寬度圖像分辨率。而前人的文章大多是放大其中一個維度以達到更高的準確率,比如ResNet-18到RestNet-152是通過增加網絡深度的方法來提高準確率。

而本文跳出了前人對放縮模型的理解,從另外一個高度去審視這些放縮維度。作者認爲這三個維度之間是相互影響的並探索出了三者之間最好的組合,在此基礎上提出了最新的網絡Efficient。

如何平衡網絡的深度、寬度和分辨率來提高模型的準確率。

通常而言,提高網絡的深度、寬度和分辨率來擴大模型,從而提高模型的泛化能力。

而EfficientNet是對深度、寬度和分辨率統一進行調整,作者提出了“複合相關係數”來動態提高這三個參數。

使用了AutoML的方式,利用網格搜索的形式來搜索出這個相關係數。其中,α,β,γ是使用網格搜索出來的常量,表明如何調整網絡的深度、寬度和分辨率;Φ是用戶自定義的相關係數,用來控制模型的擴增。

EfficientNet有8個系列,分別從b0-b7,,其中b0是baseline,b1-b7都是在b0基礎上對深度、寬度和分辨率進行調整。從官方源碼上,可以得到以下參數。其中,參數分別是寬度的相關係數,深度的相關係數,輸入圖片的分辨率和dropout的比例。這些參數如何得到的呢,就是通過剛剛介紹的AutoML進行搜索出來的。

總之,混合模型縮放方法比單一模型縮放方法對神經網絡指標有更大的提升。

02 EfficientNet的實現

官方tensorflow代碼鏈接:https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet

keras代碼鏈接:https://github.com/qubvel/efficientnet

安裝過程

  • 1.conda create -n efficient python=3.7
  • 2.conda install scikit-image
  • 3.conda install keras-gpu
  • 4.pip install -U efficientnet

例子:

  • https://www.kaggle.com/carlolepelaars/efficientnetb5-with-keras-aptos-2019#data
  • https://www.kaggle.com/raimonds1993/aptos19-efficientnet-keras-regression-lb-0-75
  • https://www.kaggle.com/ratan123/aptos-keras-efficientnet-with-attention-baseline
發佈了489 篇原創文章 · 獲贊 515 · 訪問量 37萬+
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章