動手學深度學習(tensorflow)---學習筆記整理(七、卷積神經網絡篇)

有關公式、基本理論等大量內容摘自《動手學深度學習》(TF2.0版)

前面我們需要簡潔實現都是用的Sequential來實現的,我們可能發現簡潔實現很簡單,但是內部細節可能很難控制。而自己從零開始實現又過於麻煩,而tf.keras.Model則可以實現上述的均衡。(具體內容就不詳細介紹了)

下面開始卷積神經網絡的相關概念

再說這個事情先說個事情,就是前面我們訓練的圖片向量輸入時都展成一維向量了,這樣其實是不對的,因爲這種方法破壞了縱向之間的數據關係。

可以通過如下程序進行驗證:

import tensorflow as tf
import numpy as np
print(tf.__version__)
def corr2d(X, K):
    h, w = K.shape
    Y = tf.Variable(tf.zeros((X.shape[0] - h + 1, X.shape[1] - w +1)))
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            Y[i,j].assign(tf.cast(tf.reduce_sum(X[i:i+h, j:j+w] * K), dtype=tf.float32))
    return Y
X = tf.constant([[0,1,2], [3,4,5], [6,7,8]])
K = tf.constant([[0,1], [2,3]])
print(corr2d(X, K))

檢測圖像中物體的邊緣

如下圖所示:

結果如下圖所示:

實現代碼:

import tensorflow as tf
import numpy as np
print(tf.__version__)
#卷積函數
def corr2d(X, K):
    h, w = K.shape
    Y = tf.Variable(tf.zeros((X.shape[0] - h + 1, X.shape[1] - w +1)))
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            Y[i,j].assign(tf.cast(tf.reduce_sum(X[i:i+h, j:j+w] * K), dtype=tf.float32))
    return Y

#檢測圖像的矩陣
X = tf.Variable(tf.ones((6,8)))
X[:, 2:6].assign(tf.zeros(X[:,2:6].shape))
print(X)
#卷積核
K = tf.constant([[1,-1]], dtype = tf.float32)
#卷積後的結果
Y = corr2d(X, K)
print(Y)

通過上述我們可以發現卷積核可以檢測矩陣的邊緣(這只是舉個例子,真正的圖像的邊緣會比這個複雜許多)

卷積層的權重更新

實現代碼如下(其實和之前的思想一樣的):

import tensorflow as tf
import numpy as np
print(tf.__version__)
#卷積函數
def corr2d(X, K):
    h, w = K.shape
    Y = tf.Variable(tf.zeros((X.shape[0] - h + 1, X.shape[1] - w +1)))
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            Y[i,j].assign(tf.cast(tf.reduce_sum(X[i:i+h, j:j+w] * K), dtype=tf.float32))
    return Y
#檢測圖像的矩陣
X = tf.Variable(tf.ones((6,8)))
X[:, 2:6].assign(tf.zeros(X[:,2:6].shape))
print(X)
#卷積核
K = tf.constant([[1,-1]], dtype = tf.float32)
#卷積後的結果
Y = corr2d(X, K)
print(Y)
X = tf.reshape(X, (1,6,8,1))
Y = tf.reshape(Y, (1,6,7,1))
print(Y)
conv2d = tf.keras.layers.Conv2D(1, (1,2))
print(Y.shape)
#預測值
Y_hat = conv2d(X)
for i in range(10):
    with tf.GradientTape(watch_accessed_variables=False) as g:
        g.watch(conv2d.weights[0])
        Y_hat = conv2d(X)
        #損失值
        l = (abs(Y_hat - Y)) ** 2
        dl = g.gradient(l, conv2d.weights[0])
        #學習率
        lr = 3e-2
        update = tf.multiply(lr, dl)
        #更新權重
        updated_weights = conv2d.get_weights()
        updated_weights[0] = conv2d.weights[0] - update
        conv2d.set_weights(updated_weights)
        if (i + 1)% 2 == 0:
            print('batch %d, loss %.3f' % (i + 1, tf.reduce_sum(l)))
print(tf.reshape(conv2d.get_weights()[0],(1,2)))

互相關運算和卷積運算

關鍵字:都是學出來的。如果不理解,可以把神經網絡當成一個黑盒,裏面相互關運算和卷積運算都是模型學習的出來的。

特徵圖和感受野

填充和步幅

這是卷積神經網絡裏面兩個非常重要的兩個參數。

填充:

驗證代碼如下:

import tensorflow as tf
import numpy as np
print(tf.__version__)
def comp_conv2d(conv2d, X):
    X = tf.reshape(X,(1,) + X.shape + (1,))
    Y = conv2d(X)
    #input_shape = (samples, rows, cols, channels)
    return tf.reshape(Y,Y.shape[1:3])

conv2d = tf.keras.layers.Conv2D(1, kernel_size=3, padding='same')
X = tf.random.uniform(shape=(8,8))
print(comp_conv2d(conv2d,X).shape)

步幅:

模擬代碼如下(令高和寬上的步幅均爲2):

import tensorflow as tf
import numpy as np
print(tf.__version__)
#模擬步幅
conv2d = tf.keras.layers.Conv2D(1, kernel_size=3, padding='same',strides=2)
print(comp_conv2d(conv2d, X).shape)

另一個比較複雜的代碼:

import tensorflow as tf
import numpy as np
print(tf.__version__)
#模擬步幅(3,4)
conv2d = tf.keras.layers.Conv2D(1, kernel_size=(3,5), padding='valid', strides=(3,4))
print(comp_conv2d(conv2d, X).shape)

多通道輸入和多通道輸出

代碼如下:

import tensorflow as tf
import numpy as np
print(tf.__version__)
#多通道輸入
#卷積函數
def corr2d(X, K):
    h, w = K.shape
    if len(X.shape) <= 1:
        X = tf.reshape(X, (X.shape[0],1))
    Y = tf.Variable(tf.zeros((X.shape[0] - h + 1, X.shape[1] - w +1)))
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            Y[i,j].assign(tf.cast(tf.reduce_sum(X[i:i+h, j:j+w] * K), dtype=tf.float32))
    return Y
#實現含多個輸入通道的互相關運算。我們只需要對每個通道做互相關運算,然後進行累加。
def corr2d_multi_in(X, K):
    return tf.reduce_sum([corr2d(X[i], K[i]) for i in range(X.shape[0])],axis=0)

X = tf.constant([[[0,1,2],[3,4,5],[6,7,8]],
                 [[1,2,3],[4,5,6],[7,8,9]]])
K = tf.constant([[[0,1],[2,3]],
                 [[1,2],[3,4]]])
print(corr2d_multi_in(X, K))

import tensorflow as tf
import numpy as np
print(tf.__version__)
#多通道輸入
#卷積函數
def corr2d(X, K):
    h, w = K.shape
    if len(X.shape) <= 1:
        X = tf.reshape(X, (X.shape[0],1))
    Y = tf.Variable(tf.zeros((X.shape[0] - h + 1, X.shape[1] - w +1)))
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            Y[i,j].assign(tf.cast(tf.reduce_sum(X[i:i+h, j:j+w] * K), dtype=tf.float32))
    return Y
#實現含多個輸入通道的互相關運算。我們只需要對每個通道做互相關運算,然後進行累加。
def corr2d_multi_in(X, K):
    return tf.reduce_sum([corr2d(X[i], K[i]) for i in range(X.shape[0])],axis=0)

X = tf.constant([[[0,1,2],[3,4,5],[6,7,8]],
                 [[1,2,3],[4,5,6],[7,8,9]]])
K = tf.constant([[[0,1],[2,3]],
                 [[1,2],[3,4]]])
print(corr2d_multi_in(X, K))




#多通道輸出
def corr2d_multi_in_out(X, K):
    return tf.stack([corr2d_multi_in(X, k) for k in K],axis=0)
print("K:",K)
print("K+1:",K+1)
print("K+2:",K+2)
K = tf.stack([K, K+1, K+2],axis=0)
print(K.shape)
print(corr2d_multi_in_out(X, K))
#等價於下述操作
print(corr2d_multi_in(X, K))
print(corr2d_multi_in(X, K+1))
print(corr2d_multi_in(X, K+2))

這裏其實就是輸入是3*3*3(代表三個輸入通道*一個大小爲3*3的矩陣),因爲輸入爲3通道,所以單個卷積核爲3個1*1的卷積核(例如淺藍色爲一個1*1卷積核,深藍色代表另一個1*1的卷積核),單個卷積核採樣時,對於該圖生成一個3*3*3的矩陣,不過這三個矩陣會相加,所以結果時3*3的矩陣,由於有兩個卷積核,所以輸出2個3*3的矩陣。

實現代碼如下:

import tensorflow as tf
import numpy as np
print(tf.__version__)
#多通道輸入
#卷積函數
def corr2d(X, K):
    h, w = K.shape
    if len(X.shape) <= 1:
        X = tf.reshape(X, (X.shape[0],1))
    Y = tf.Variable(tf.zeros((X.shape[0] - h + 1, X.shape[1] - w +1)))
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            Y[i,j].assign(tf.cast(tf.reduce_sum(X[i:i+h, j:j+w] * K), dtype=tf.float32))
    return Y
#實現含多個輸入通道的互相關運算。我們只需要對每個通道做互相關運算,然後進行累加。
def corr2d_multi_in(X, K):
    return tf.reduce_sum([corr2d(X[i], K[i]) for i in range(X.shape[0])],axis=0)

X = tf.constant([[[0,1,2],[3,4,5],[6,7,8]],
                 [[1,2,3],[4,5,6],[7,8,9]]])
K = tf.constant([[[0,1],[2,3]],
                 [[1,2],[3,4]]])
#多通道輸出
def corr2d_multi_in_out(X, K):
    return tf.stack([corr2d_multi_in(X, k) for k in K],axis=0)
#1*1卷積核
def corr2d_multi_in_out_1x1(X, K):
    c_i, h, w = X.shape
    c_o = K.shape[0]
    X = tf.reshape(X,(c_i, h * w))
    K = tf.reshape(K,(c_o, c_i))
    Y = tf.matmul(K, X)
    return tf.reshape(Y, (c_o, h, w))
X = tf.random.uniform((3,3,3))
K = tf.random.uniform((2,3,1,1))

Y1 = corr2d_multi_in_out_1x1(X, K)
Y2 = corr2d_multi_in_out(X, K)

print(tf.norm(Y1-Y2) < 1e-6)

小結:

1*1卷積層可以調整通道數,例如上面樣例,將3通道3*3的矩陣變化爲2通道3*3的矩陣;在某種意義上,將通道維當作特徵維,高寬上的數據當作樣本數據,1*1卷積層與全連接層等價。

池化

驗證代碼:

import tensorflow as tf
import numpy as np
print(tf.__version__)
def pool2d(X, pool_size, mode='max'):
    p_h, p_w = pool_size
    Y = tf.zeros((X.shape[0] - p_h + 1, X.shape[1] - p_w +1))
    Y = tf.Variable(Y)
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            #最大池化
            if mode == 'max':
                Y[i,j].assign(tf.reduce_max(X[i:i+p_h, j:j+p_w]))
            #平均池化
            elif mode =='avg':
                Y[i,j].assign(tf.reduce_mean(X[i:i+p_h, j:j+p_w]))
    return Y
#圖示驗證
X = tf.constant([[0,1,2],[3,4,5],[6,7,8]],dtype=tf.float32)
print(pool2d(X, (2,2)))
#物體邊緣檢測
X = tf.Variable(tf.ones((6,8)))
X[:, 2:6].assign(tf.zeros(X[:,2:6].shape))
print(pool2d(X, (2,2)))

結果:

填充和步頻:

tensorflow默認數據類型爲'channels_last',所以這裏使用(1,4,4,1)而不是(1,1,4,4)

驗證代碼:

import tensorflow as tf
import numpy as np
print(tf.__version__)
def pool2d(X, pool_size, mode='max'):
    p_h, p_w = pool_size
    Y = tf.zeros((X.shape[0] - p_h + 1, X.shape[1] - p_w +1))
    Y = tf.Variable(Y)
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            #最大池化
            if mode == 'max':
                Y[i,j].assign(tf.reduce_max(X[i:i+p_h, j:j+p_w]))
            #平均池化
            elif mode =='avg':
                Y[i,j].assign(tf.reduce_mean(X[i:i+p_h, j:j+p_w]))
    return Y
#tensorflow default data_format == 'channels_last'
#so (1,4,4,1) instead of (1,1,4,4)
X = tf.reshape(tf.constant(range(16)), (1,4,4,1))
print(X)
#當步幅過大時會填充
#默認情況下,MaxPool2D實例裏步幅和池化窗口形狀相同
pool2d = tf.keras.layers.MaxPool2D(pool_size=[3,3])
print(pool2d(X))

#步幅爲2
pool2d = tf.keras.layers.MaxPool2D(pool_size=[3,3],padding='same',strides=2)
print(pool2d(X))

多通道輸入和輸出

關鍵是記住多通道的池化,其實就是分別池化,沒有卷積的求和,就行了。

2*4*4*1的池化過程中僅僅4*4的維度發生變化,變成n*n

驗證代碼:

import tensorflow as tf
import numpy as np
print(tf.__version__)
def pool2d(X, pool_size, mode='max'):
    p_h, p_w = pool_size
    Y = tf.zeros((X.shape[0] - p_h + 1, X.shape[1] - p_w +1))
    Y = tf.Variable(Y)
    for i in range(Y.shape[0]):
        for j in range(Y.shape[1]):
            #最大池化
            if mode == 'max':
                Y[i,j].assign(tf.reduce_max(X[i:i+p_h, j:j+p_w]))
            #平均池化
            elif mode =='avg':
                Y[i,j].assign(tf.reduce_mean(X[i:i+p_h, j:j+p_w]))
    return Y
X = tf.reshape(tf.constant(range(16)), (1,4,4,1))
#多通道
X = tf.stack([X, X+1], axis=3)
X = tf.reshape(X, (2,4,4,1))
print(X.shape)
pool2d = tf.keras.layers.MaxPool2D(3, padding='same', strides=2)
print(pool2d(X))

上述驗證程序,池化後維度爲(2, 2, 2, 1)

小結:

  • 最大池化和平均池化分別取池化窗口中輸入元素的最大值和平均值作爲輸出。
  • 池化層的一個主要作用是緩解卷積層對位置的過度敏感性。
  • 可以指定池化層的填充和步幅。
  • 池化層的輸出通道數跟輸入通道數相同。

上述將卷積神經網絡的基礎都介紹完了,下面開始針對具體的卷積神經網絡來進行介紹了~

卷積神經網絡(LeNet)

LeNet模型

實現代碼·如下:

import tensorflow as tf
import numpy as np
print(tf.__version__)
#定義模型
net = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters=6,kernel_size=5,activation='sigmoid',input_shape=(28,28,1)),
    tf.keras.layers.MaxPool2D(pool_size=2, strides=2),
    tf.keras.layers.Conv2D(filters=16,kernel_size=5,activation='sigmoid'),
    tf.keras.layers.MaxPool2D(pool_size=2, strides=2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(120,activation='sigmoid'),
    tf.keras.layers.Dense(84,activation='sigmoid'),
    tf.keras.layers.Dense(10,activation='sigmoid')
])
#構造一個單通道28*28的樣本
X = tf.random.uniform((1,28,28,1))
#逐層進行前向計算來查看每個層的輸出形狀
for layer in net.layers:
    X = layer(X)
    print(layer.name, 'output shape\t', X.shape)
#在卷積層塊中輸入的高和寬在逐層減小。
# 卷積層由於使用高和寬均爲5的卷積核,從而將高和寬分別減小4,而池化層則將高和寬減半,但通道數則從1增加到16。
#最後將數據展成1維,由全連接層則逐層減少輸出個數,直到變成圖像的類別數10。

#獲取數據集
fashion_mnist = tf.keras.datasets.fashion_mnist
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
#查看數據集形狀
print(train_images.shape)
#print(train_labels.shape)
print(test_images.shape)
#將數據集合改變形狀,其實就是增加通道數
train_images = tf.reshape(train_images, (train_images.shape[0],train_images.shape[1],train_images.shape[2], 1))
print(train_images.shape)
test_images = tf.reshape(test_images, (test_images.shape[0],test_images.shape[1],test_images.shape[2], 1))
print(test_images.shape)
#定義模型損失函數、優化器等
optimizer = tf.keras.optimizers.SGD(learning_rate=0.9, momentum=0.0, nesterov=False)
net.compile(optimizer=optimizer,
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])
#訓練函數
net.fit(train_images, train_labels, epochs=5, validation_split=0.1)

小結:

  • 卷積神經網絡就是含卷積層的網絡。
  • LeNet交替使用卷積層和最大池化層後接全連接層來進行圖像分類。

深度卷積神經網絡(AlexNet)

學習特徵表示

下面我們實現稍微簡化過的AlexNet

import tensorflow as tf
import numpy as np
print(tf.__version__)
#使用gpu
# for gpu in tf.config.experimental.list_physical_devices('GPU'):
#     tf.config.experimental.set_memory_growth(gpu, True)
#定義模型
net = tf.keras.models.Sequential([
    tf.keras.layers.Conv2D(filters=96,kernel_size=11,strides=4,activation='relu'),
    tf.keras.layers.MaxPool2D(pool_size=3, strides=2),
    tf.keras.layers.Conv2D(filters=256,kernel_size=5,padding='same',activation='relu'),
    tf.keras.layers.MaxPool2D(pool_size=3, strides=2),
    tf.keras.layers.Conv2D(filters=384,kernel_size=3,padding='same',activation='relu'),
    tf.keras.layers.Conv2D(filters=384,kernel_size=3,padding='same',activation='relu'),
    tf.keras.layers.Conv2D(filters=256,kernel_size=3,padding='same',activation='relu'),
    tf.keras.layers.MaxPool2D(pool_size=3, strides=2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(4096,activation='relu'),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(4096,activation='relu'),
    tf.keras.layers.Dropout(0.5),
    tf.keras.layers.Dense(10,activation='sigmoid')
])
#隨機定義一個矩陣
X = tf.random.uniform((1,224,224,1))
#觀察每層結構
for layer in net.layers:
    X = layer(X)
    print(layer.name, 'output shape\t', X.shape)
#加載數據集
#雖然論文中AlexNet使用ImageNet數據集,但因爲ImageNet數據集訓練時間較長,我們仍用前面的Fashion-MNIST數據集來演示AlexNet。
# 讀取數據的時候我們額外做了一步將圖像高和寬擴大到AlexNet使用的圖像高和寬224。這個可以通過tf.image.resize_with_pad來實現。
class DataLoader():
    def __init__(self):
        fashion_mnist = tf.keras.datasets.fashion_mnist
        (self.train_images, self.train_labels), (self.test_images, self.test_labels) = fashion_mnist.load_data()
        self.train_images = np.expand_dims(self.train_images.astype(np.float32)/255.0,axis=-1)
        self.test_images = np.expand_dims(self.test_images.astype(np.float32)/255.0,axis=-1)
        self.train_labels = self.train_labels.astype(np.int32)
        self.test_labels = self.test_labels.astype(np.int32)
        self.num_train, self.num_test = self.train_images.shape[0], self.test_images.shape[0]

    def get_batch_train(self, batch_size):
        index = np.random.randint(0, np.shape(self.train_images)[0], batch_size)
        #need to resize images to (224,224)
        resized_images = tf.image.resize_with_pad(self.train_images[index],224,224,)
        return resized_images.numpy(), self.train_labels[index]

    def get_batch_test(self, batch_size):
        index = np.random.randint(0, np.shape(self.test_images)[0], batch_size)
        #need to resize images to (224,224)
        resized_images = tf.image.resize_with_pad(self.test_images[index],224,224,)
        return resized_images.numpy(), self.test_labels[index]

batch_size = 128
dataLoader = DataLoader()
x_batch, y_batch = dataLoader.get_batch_train(batch_size)
print("x_batch shape:",x_batch.shape,"y_batch shape:", y_batch.shape)
#訓練
def train_alexnet():
    epoch = 5
    num_iter = dataLoader.num_train//batch_size
    for e in range(epoch):

        for n in range(num_iter):
            print("輪:", e,"  第",n,"/",num_iter,"次")
            #每次隨機選取128個樣本進行訓練
            x_batch, y_batch = dataLoader.get_batch_train(batch_size)
            net.fit(x_batch, y_batch)
            if n%20 == 0:
                net.save_weights("AlexNet.h5")

optimizer = tf.keras.optimizers.SGD(learning_rate=0.01, momentum=0.0, nesterov=False)

net.compile(optimizer=optimizer,
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

x_batch, y_batch = dataLoader.get_batch_train(batch_size)
#x訓練一次
#net.fit(x_batch, y_batch)
print("---------------------")
#邊訓練邊保存
train_alexnet()
net.load_weights("AlexNet.h5")
x_test, y_test = dataLoader.get_batch_test(2000)
net.evaluate(x_test, y_test, verbose=2)

小結:

  • AlexNet跟LeNet結構類似,但使用了更多的卷積層和更大的參數空間來擬合大規模數據集ImageNet。它是淺層神經網絡和深度神經網絡的分界線。
  • 雖然看上去AlexNet的實現比LeNet的實現也就多了幾行代碼而已,但這個觀念上的轉變和真正優秀實驗結果的產生令學術界付出了很多年。

使用重複元素的網絡(VGG)

主要分爲VGG塊和VGG網絡搭建。

VGG塊

VGG網絡

(除了VGG-11外還有許多其他VGG網絡,如果有興趣可以去研究一下)

具體代碼實現如下:

import tensorflow as tf
print(tf.__version__)

for gpu in tf.config.experimental.list_physical_devices('GPU'):
    tf.config.experimental.set_memory_growth(gpu, True)
#VGG塊
def vgg_block(num_convs, num_channels):
    blk = tf.keras.models.Sequential()
    for _ in range(num_convs):
        blk.add(tf.keras.layers.Conv2D(num_channels,kernel_size=3,
                                    padding='same',activation='relu'))

    blk.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2))
    return blk
#需要定義的網絡
conv_arch = ((1, 64), (1, 128), (2, 256), (2, 512), (2, 512))
#實現VGG-11網絡
def vgg(conv_arch):
    net = tf.keras.models.Sequential()
    for (num_convs, num_channels) in conv_arch:
        net.add(vgg_block(num_convs,num_channels))
    net.add(tf.keras.models.Sequential([tf.keras.layers.Flatten(),
             tf.keras.layers.Dense(4096,activation='relu'),
             tf.keras.layers.Dropout(0.5),
             tf.keras.layers.Dense(4096,activation='relu'),
             tf.keras.layers.Dropout(0.5),
             tf.keras.layers.Dense(10,activation='sigmoid')]))
    return net
net = vgg(conv_arch)

#因爲VGG-11計算上比AlexNet更加複雜,出於測試的目的我們構造一個通道數更小,或者說更窄的網絡在Fashion-MNIST數據集上進行訓練。
ratio = 4
#縮減四倍
small_conv_arch = [(pair[0], pair[1] // ratio) for pair in conv_arch]
print("small_conv_arch:",small_conv_arch)
#降爲[(1, 16), (1, 32), (2, 64), (2, 128), (2, 128)]
net = vgg(small_conv_arch)
import numpy as np
#獲取數據
class DataLoader():
    def __init__(self):
        fashion_mnist = tf.keras.datasets.fashion_mnist
        (self.train_images, self.train_labels), (self.test_images, self.test_labels) = fashion_mnist.load_data()
        self.train_images = np.expand_dims(self.train_images.astype(np.float32)/255.0,axis=-1)
        self.test_images = np.expand_dims(self.test_images.astype(np.float32)/255.0,axis=-1)
        self.train_labels = self.train_labels.astype(np.int32)
        self.test_labels = self.test_labels.astype(np.int32)
        self.num_train, self.num_test = self.train_images.shape[0], self.test_images.shape[0]

    def get_batch_train(self, batch_size):
        index = np.random.randint(0, np.shape(self.train_images)[0], batch_size)
        #need to resize images to (224,224)
        resized_images = tf.image.resize_with_pad(self.train_images[index],224,224,)
        return resized_images.numpy(), self.train_labels[index]

    def get_batch_test(self, batch_size):
        index = np.random.randint(0, np.shape(self.test_images)[0], batch_size)
        #need to resize images to (224,224)
        resized_images = tf.image.resize_with_pad(self.test_images[index],224,224,)
        return resized_images.numpy(), self.test_labels[index]

batch_size = 128
dataLoader = DataLoader()
x_batch, y_batch = dataLoader.get_batch_train(batch_size)
print("x_batch shape:",x_batch.shape,"y_batch shape:", y_batch.shape)
def train_vgg():
    epoch = 5
    num_iter = dataLoader.num_train//batch_size
    for e in range(epoch):
        for n in range(num_iter):
            print("輪:", e, "  第", n, "/", num_iter, "次")
            x_batch, y_batch = dataLoader.get_batch_train(batch_size)
            net.fit(x_batch, y_batch)
            if n%20 == 0:
                net.save_weights("VGG.h5")

optimizer = tf.keras.optimizers.SGD(learning_rate=0.05, momentum=0.0, nesterov=False)

net.compile(optimizer=optimizer,
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

x_batch, y_batch = dataLoader.get_batch_train(batch_size)
#net.fit(x_batch, y_batch)
train_vgg()
#讀取參數並預測
net.load_weights("VGG.h5")

x_test, y_test = dataLoader.get_batch_test(2000)
net.evaluate(x_test, y_test, verbose=2)

小結:

  • VGG-11通過5個可以重複使用的卷積塊來構造網絡。根據每塊裏卷積層個數和輸出通道數的不同可以定義出不同的VGG模型。

網絡中的網絡(NiN)

主要也是有兩部分構成,NiN塊和NiN網絡。

NiN塊

NiN網絡

代碼實現如下:

import tensorflow as tf
print(tf.__version__)

for gpu in tf.config.experimental.list_physical_devices('GPU'):
    tf.config.experimental.set_memory_growth(gpu, True)
#NiN塊
def nin_block(num_channels, kernel_size, strides, padding):
    blk = tf.keras.models.Sequential()
    blk.add(tf.keras.layers.Conv2D(num_channels, kernel_size,
                                   strides=strides, padding=padding, activation='relu'))
    blk.add(tf.keras.layers.Conv2D(num_channels, kernel_size=1,activation='relu'))
    blk.add(tf.keras.layers.Conv2D(num_channels, kernel_size=1,activation='relu'))
    return blk
#NiN模型
def NiN():
    net = tf.keras.models.Sequential()
    net.add(nin_block(96, kernel_size=11, strides=4, padding='valid'))
    net.add(tf.keras.layers.MaxPool2D(pool_size=3, strides=2))
    net.add(nin_block(256, kernel_size=5, strides=1, padding='same'))
    net.add(tf.keras.layers.MaxPool2D(pool_size=3, strides=2))
    net.add(nin_block(384, kernel_size=3, strides=1, padding='same'))
    net.add(tf.keras.layers.MaxPool2D(pool_size=3, strides=2))
    net.add(tf.keras.layers.Dropout(0.5))
    net.add(nin_block(10, kernel_size=3, strides=1, padding='same'))
    net.add(tf.keras.layers.GlobalAveragePooling2D())
    net.add(tf.keras.layers.Flatten())
    return net
net=NiN()
#構造一個高和寬均爲224的單通道數據樣本來觀察每一層的輸出形狀
X = tf.random.uniform((1,224,224,1))
for blk in net.layers:
    X = blk(X)
    print(blk.name, 'output shape:\t', X.shape)
#獲取數據
import numpy as np

class DataLoader():
    def __init__(self):
        fashion_mnist = tf.keras.datasets.fashion_mnist
        (self.train_images, self.train_labels), (self.test_images, self.test_labels) = fashion_mnist.load_data()
        self.train_images = np.expand_dims(self.train_images.astype(np.float32)/255.0,axis=-1)
        self.test_images = np.expand_dims(self.test_images.astype(np.float32)/255.0,axis=-1)
        self.train_labels = self.train_labels.astype(np.int32)
        self.test_labels = self.test_labels.astype(np.int32)
        self.num_train, self.num_test = self.train_images.shape[0], self.test_images.shape[0]

    def get_batch_train(self, batch_size):
        index = np.random.randint(0, np.shape(self.train_images)[0], batch_size)
        #need to resize images to (224,224)
        resized_images = tf.image.resize_with_pad(self.train_images[index],224,224,)
        return resized_images.numpy(), self.train_labels[index]

    def get_batch_test(self, batch_size):
        index = np.random.randint(0, np.shape(self.test_images)[0], batch_size)
        #need to resize images to (224,224)
        resized_images = tf.image.resize_with_pad(self.test_images[index],224,224,)
        return resized_images.numpy(), self.test_labels[index]

batch_size = 128
dataLoader = DataLoader()
x_batch, y_batch = dataLoader.get_batch_train(batch_size)
print("x_batch shape:",x_batch.shape,"y_batch shape:", y_batch.shape)
#訓練
def train_nin():
    #net.load_weights("NiN.h5")
    epoch = 5
    num_iter = dataLoader.num_train//batch_size
    for e in range(epoch):
        for n in range(num_iter):
            print("輪:", e, "  第", n, "/", num_iter, "次")
            x_batch, y_batch = dataLoader.get_batch_train(batch_size)
            net.fit(x_batch, y_batch)
            if n%20 == 0:
                net.save_weights("NiN.h5")

# optimizer = tf.keras.optimizers.SGD(learning_rate=0.06, momentum=0.3, nesterov=False)
optimizer = tf.keras.optimizers.Adam(lr=1e-7)
net.compile(optimizer=optimizer,
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

x_batch, y_batch = dataLoader.get_batch_train(batch_size)
#net.fit(x_batch, y_batch)
train_nin()
#加載+預測
net.load_weights("NiN.h5")

x_test, y_test = dataLoader.get_batch_test(2000)
net.evaluate(x_test, y_test, verbose=2)

含並行連結的網絡(GoogLeNet)

其主要內容也是塊和網絡,不過與前面的幾個模型的命名方式不太一樣喲。

Inception 塊

GoogLeNet模型

GoogLeNet模型的計算複雜,而且不如VGG那樣便於修改通道數。

最終代碼如下:

import tensorflow as tf
print(tf.__version__)

for gpu in tf.config.experimental.list_physical_devices('GPU'):
    tf.config.experimental.set_memory_growth(gpu, True)
#定義Inception塊
class Inception(tf.keras.layers.Layer):
    def __init__(self,c1, c2, c3, c4):
        super().__init__()
        # 線路1,單1 x 1卷積層
        self.p1_1 = tf.keras.layers.Conv2D(c1, kernel_size=1, activation='relu', padding='same')
        # 線路2,1 x 1卷積層後接3 x 3卷積層
        self.p2_1 = tf.keras.layers.Conv2D(c2[0], kernel_size=1, padding='same', activation='relu')
        self.p2_2 = tf.keras.layers.Conv2D(c2[1], kernel_size=3, padding='same',
                              activation='relu')
        # 線路3,1 x 1卷積層後接5 x 5卷積層
        self.p3_1 = tf.keras.layers.Conv2D(c3[0], kernel_size=1, padding='same', activation='relu')
        self.p3_2 = tf.keras.layers.Conv2D(c3[1], kernel_size=5, padding='same',
                              activation='relu')
        # 線路4,3 x 3最大池化層後接1 x 1卷積層
        self.p4_1 = tf.keras.layers.MaxPool2D(pool_size=3, padding='same', strides=1)
        self.p4_2 = tf.keras.layers.Conv2D(c4, kernel_size=1, padding='same', activation='relu')

    def call(self, x):
        p1 = self.p1_1(x)
        p2 = self.p2_2(self.p2_1(x))
        p3 = self.p3_2(self.p3_1(x))
        p4 = self.p4_2(self.p4_1(x))
        return tf.concat([p1, p2, p3, p4], axis=-1)  # 在通道維上連結輸出
#進行模擬
Inception(64, (96, 128), (16, 32), 32)
#第一模塊
b1 = tf.keras.models.Sequential()
b1.add(tf.keras.layers.Conv2D(64, kernel_size=7, strides=2, padding='same', activation='relu'))
b1.add(tf.keras.layers.MaxPool2D(pool_size=3, strides=2, padding='same'))
#第二模塊
b2 = tf.keras.models.Sequential()
b2.add(tf.keras.layers.Conv2D(64, kernel_size=1, padding='same', activation='relu'))
b2.add(tf.keras.layers.Conv2D(192, kernel_size=3, padding='same', activation='relu'))
b2.add(tf.keras.layers.MaxPool2D(pool_size=3, strides=2, padding='same'))
#第三模塊
b3 = tf.keras.models.Sequential()
b3.add(Inception(64, (96, 128), (16, 32), 32))
b3.add(Inception(128, (128, 192), (32, 96), 64))
b3.add(tf.keras.layers.MaxPool2D(pool_size=3, strides=2, padding='same'))
#第四模塊
b4 = tf.keras.models.Sequential()
b4.add(Inception(192, (96, 208), (16, 48), 64))
b4.add(Inception(160, (112, 224), (24, 64), 64))
b4.add(Inception(128, (128, 256), (24, 64), 64))
b4.add(Inception(112, (144, 288), (32, 64), 64))
b4.add(Inception(256, (160, 320), (32, 128), 128))
b4.add(tf.keras.layers.MaxPool2D(pool_size=3, strides=2, padding='same'))
#第五模塊
b5 = tf.keras.models.Sequential()
b5.add(Inception(256, (160, 320), (32, 128), 128))
b5.add(Inception(384, (192, 384), (48, 128), 128))
b5.add(tf.keras.layers.GlobalAvgPool2D())
#合併
net = tf.keras.models.Sequential([b1, b2, b3, b4, b5, tf.keras.layers.Dense(10)])
#演示數據,查看結構
X = tf.random.uniform(shape=(1, 96, 96, 1))
for layer in net.layers:
    X = layer(X)
    print(layer.name, 'output shape:\t', X.shape)
#獲取數據
import numpy as np

class DataLoader():
    def __init__(self):
        fashion_mnist = tf.keras.datasets.fashion_mnist
        (self.train_images, self.train_labels), (self.test_images, self.test_labels) = fashion_mnist.load_data()
        self.train_images = np.expand_dims(self.train_images.astype(np.float32)/255.0,axis=-1)
        self.test_images = np.expand_dims(self.test_images.astype(np.float32)/255.0,axis=-1)
        self.train_labels = self.train_labels.astype(np.int32)
        self.test_labels = self.test_labels.astype(np.int32)
        self.num_train, self.num_test = self.train_images.shape[0], self.test_images.shape[0]

    def get_batch_train(self, batch_size):
        index = np.random.randint(0, np.shape(self.train_images)[0], batch_size)
        #need to resize images to (224,224)
        resized_images = tf.image.resize_with_pad(self.train_images[index],224,224,)
        return resized_images.numpy(), self.train_labels[index]

    def get_batch_test(self, batch_size):
        index = np.random.randint(0, np.shape(self.test_images)[0], batch_size)
        #need to resize images to (224,224)
        resized_images = tf.image.resize_with_pad(self.test_images[index],224,224,)
        return resized_images.numpy(), self.test_labels[index]

batch_size = 128
dataLoader = DataLoader()
x_batch, y_batch = dataLoader.get_batch_train(batch_size)
print("x_batch shape:",x_batch.shape,"y_batch shape:", y_batch.shape)
#訓練模型
def train_googlenet():
    #net.load_weights("GoogLeNet.h5")
    epoch = 5
    num_iter = dataLoader.num_train//batch_size
    for e in range(epoch):
        for n in range(num_iter):
            print("輪:", e, "  第", n, "/", num_iter, "次")
            x_batch, y_batch = dataLoader.get_batch_train(batch_size)
            net.fit(x_batch, y_batch)
            if n%20 == 0:
                net.save_weights("GoogLeNet.h5")

# optimizer = tf.keras.optimizers.SGD(learning_rate=0.05, momentum=0.0, nesterov=False)
optimizer = tf.keras.optimizers.Adam(lr=1e-7)

net.compile(optimizer=optimizer,
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

x_batch, y_batch = dataLoader.get_batch_train(batch_size)
#net.fit(x_batch, y_batch)
train_googlenet()
#加載模型+預測
net.load_weights("GoogLeNet.h5")

x_test, y_test = dataLoader.get_batch_test(2000)
net.evaluate(x_test, y_test, verbose=2)

小結:

  • Inception塊相當於一個有4條線路的子網絡。它通過不同窗口形狀的卷積層和最大池化層來並行抽取信息,並使用1×11×11×1卷積層減少通道數從而降低模型複雜度。
  • GoogLeNet將多個設計精細的Inception塊和其他層串聯起來。其中Inception塊的通道數分配之比是在ImageNet數據集上通過大量的實驗得來的。
  • GoogLeNet和它的後繼者們一度是ImageNet上最高效的模型之一:在類似的測試精度下,它們的計算複雜度往往更低。

批量歸一化

對全連接層和卷積層做批量歸一化的方法稍有不同。下面我們將分別介紹這兩種情況下的批量歸一化。

對全連接層做批量歸一化

對卷積層做批量歸一化

預測時的批量歸一化

從零實現歸一化(LeNet

import tensorflow as tf
import numpy as np
def batch_norm(is_training,X, gamma, beta, moving_mean, moving_var, eps, momentum):
    # 判斷是當前模式是訓練模式還是預測模式
    if not is_training:
        # 如果是在預測模式下,直接使用傳入的移動平均所得的均值和方差
        X_hat = (X - moving_mean) / np.sqrt(moving_var + eps)
    else:
        assert len(X.shape) in (2, 4)
        if len(X.shape) == 2:
            # 使用全連接層的情況,計算特徵維上的均值和方差
            mean = X.mean(axis=0)
            var = ((X - mean) ** 2).mean(axis=0)
        else:
            # 使用二維卷積層的情況,計算通道維上(axis=1)的均值和方差。這裏我們需要保持
            # X的形狀以便後面可以做廣播運算
            mean = X.mean(axis=(0, 2, 3), keepdims=True)
            var = ((X - mean) ** 2).mean(axis=(0, 2, 3), keepdims=True)
        # 訓練模式下用當前的均值和方差做標準化
        X_hat = (X - mean) / np.sqrt(var + eps)
        # 更新移動平均的均值和方差
        moving_mean = momentum * moving_mean + (1.0 - momentum) * mean
        moving_var = momentum * moving_var + (1.0 - momentum) * var
    Y = gamma * X_hat + beta  # 拉伸和偏移
    return Y, moving_mean, moving_var
#自定義一個BatchNorm層。它保存參與求梯度和迭代的拉伸參數gamma和偏移參數beta,同時也維護移動平均得到的均值和方差,以便能夠在模型預測時被使用。
# BatchNorm實例所需指定的num_features參數對於全連接層來說應爲輸出個數,對於卷積層來說則爲輸出通道數。該實例所需指定的num_dims參數對於全連接層和卷積層來說分別爲2和4。
class BatchNormalization(tf.keras.layers.Layer):
    def __init__(self, decay=0.9, epsilon=1e-5, **kwargs):
        self.decay = decay
        self.epsilon = epsilon
        super(BatchNormalization, self).__init__(**kwargs)

    def build(self, input_shape):
        self.gamma = self.add_weight(name='gamma',
                                     shape=[input_shape[-1], ],
                                     initializer=tf.initializers.ones,
                                     trainable=True)
        self.beta = self.add_weight(name='beta',
                                    shape=[input_shape[-1], ],
                                    initializer=tf.initializers.zeros,
                                    trainable=True)
        self.moving_mean = self.add_weight(name='moving_mean',
                                           shape=[input_shape[-1], ],
                                           initializer=tf.initializers.zeros,
                                           trainable=False)
        self.moving_variance = self.add_weight(name='moving_variance',
                                               shape=[input_shape[-1], ],
                                               initializer=tf.initializers.ones,
                                               trainable=False)
        super(BatchNormalization, self).build(input_shape)

    def assign_moving_average(self, variable, value):
        """
        variable = variable * decay + value * (1 - decay)
        """
        delta = variable * self.decay + value * (1 - self.decay)
        return variable.assign(delta)

    @tf.function
    def call(self, inputs, training):
        if training:
            batch_mean, batch_variance = tf.nn.moments(inputs, list(range(len(inputs.shape) - 1)))
            mean_update = self.assign_moving_average(self.moving_mean, batch_mean)
            variance_update = self.assign_moving_average(self.moving_variance, batch_variance)
            self.add_update(mean_update)
            self.add_update(variance_update)
            mean, variance = batch_mean, batch_variance
        else:
            mean, variance = self.moving_mean, self.moving_variance
        output = tf.nn.batch_normalization(inputs,
                                           mean=mean,
                                           variance=variance,
                                           offset=self.beta,
                                           scale=self.gamma,
                                           variance_epsilon=self.epsilon)
        return output

    def compute_output_shape(self, input_shape):
        return input_shape
#定義網絡LeNet
net = tf.keras.models.Sequential(
    [tf.keras.layers.Conv2D(filters=6,kernel_size=5),
    BatchNormalization(),
    tf.keras.layers.Activation('sigmoid'),
    tf.keras.layers.MaxPool2D(pool_size=2, strides=2),
    tf.keras.layers.Conv2D(filters=16,kernel_size=5),
    BatchNormalization(),
    tf.keras.layers.Activation('sigmoid'),
    tf.keras.layers.MaxPool2D(pool_size=2, strides=2),
    tf.keras.layers.Flatten(),
    tf.keras.layers.Dense(120),
    BatchNormalization(),
    tf.keras.layers.Activation('sigmoid'),
    tf.keras.layers.Dense(84),
    BatchNormalization(),
    tf.keras.layers.Activation('sigmoid'),
    tf.keras.layers.Dense(10,activation='sigmoid')]
)
#獲取數據+訓練數據
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape((60000, 28, 28, 1)).astype('float32') / 255
x_test = x_test.reshape((10000, 28, 28, 1)).astype('float32') / 255

net.compile(loss='sparse_categorical_crossentropy',
              optimizer=tf.keras.optimizers.RMSprop(),
              metrics=['accuracy'])
history = net.fit(x_train, y_train,
                    batch_size=64,
                    epochs=5,
                    validation_split=0.2)

test_scores = net.evaluate(x_test, y_test, verbose=2)
print('Test loss:', test_scores[0])
print('Test accuracy:', test_scores[1])
#查看第一個批量歸一化層學習到的拉伸參數gamma和偏移參數beta
print(net.get_layer(index=1).gamma,net.get_layer(index=1).beta)

歸一化的簡單實現

import tensorflow as tf
import numpy as np
#定義模型
net = tf.keras.models.Sequential()
net.add(tf.keras.layers.Conv2D(filters=6,kernel_size=5))
net.add(tf.keras.layers.BatchNormalization())
net.add(tf.keras.layers.Activation('sigmoid'))
net.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2))
net.add(tf.keras.layers.Conv2D(filters=16,kernel_size=5))
net.add(tf.keras.layers.BatchNormalization())
net.add(tf.keras.layers.Activation('sigmoid'))
net.add(tf.keras.layers.MaxPool2D(pool_size=2, strides=2))
net.add(tf.keras.layers.Flatten())
net.add(tf.keras.layers.Dense(120))
net.add(tf.keras.layers.BatchNormalization())
net.add(tf.keras.layers.Activation('sigmoid'))
net.add(tf.keras.layers.Dense(84))
net.add(tf.keras.layers.BatchNormalization())
net.add(tf.keras.layers.Activation('sigmoid'))
net.add(tf.keras.layers.Dense(10,activation='sigmoid'))
#獲取數據+訓練
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()
x_train = x_train.reshape((60000, 28, 28, 1)).astype('float32') / 255
x_test = x_test.reshape((10000, 28, 28, 1)).astype('float32') / 255

net.compile(loss='sparse_categorical_crossentropy',
              optimizer=tf.keras.optimizers.RMSprop(),
              metrics=['accuracy'])
history = net.fit(x_train, y_train,
                    batch_size=64,
                    epochs=5,
                    validation_split=0.2)
test_scores = net.evaluate(x_test, y_test, verbose=2)
print('Test loss:', test_scores[0])
print('Test accuracy:', test_scores[1])

小結:

  • 在模型訓練時,批量歸一化利用小批量上的均值和標準差,不斷調整神經網絡的中間輸出,從而使整個神經網絡在各層的中間輸出的數值更穩定。
  • 對全連接層和卷積層做批量歸一化的方法稍有不同。
  • 批量歸一化層和丟棄層一樣,在訓練模式和預測模式的計算結果是不一樣的。
  • keras提供的BatchNorm類使用起來簡單、方便。(最好選擇簡單實現)

殘差網絡(ResNet)

殘差神經網絡也有由殘差塊和ResNet網絡完成的。

殘差塊

ResNet網絡

具體實現代碼:

import tensorflow as tf
from tensorflow.keras import layers,activations
#定義殘差塊
class Residual(tf.keras.Model):
    def __init__(self, num_channels, use_1x1conv=False, strides=1, **kwargs):
        super(Residual, self).__init__(**kwargs)
        self.conv1 = layers.Conv2D(num_channels,
                                   padding='same',
                                   kernel_size=3,
                                   strides=strides)
        self.conv2 = layers.Conv2D(num_channels, kernel_size=3,padding='same')
        if use_1x1conv:
            self.conv3 = layers.Conv2D(num_channels,
                                       kernel_size=1,
                                       strides=strides)
        else:
            self.conv3 = None
        self.bn1 = layers.BatchNormalization()
        self.bn2 = layers.BatchNormalization()

    def call(self, X):
        Y = activations.relu(self.bn1(self.conv1(X)))
        Y = self.bn2(self.conv2(Y))
        if self.conv3:
            X = self.conv3(X)
        return activations.relu(Y + X)
#查看輸入和輸出形狀一致的情況
blk = Residual(3)
#tensorflow input shpe     (n_images, x_shape, y_shape, channels).
#mxnet.gluon.nn.conv_layers    (batch_size, in_channels, height, width)
X = tf.random.uniform((4, 6, 6 , 3))
blk(X).shape#TensorShape([4, 6, 6, 3])
#也可以在增加輸出通道數的同時減半輸出的高和寬
blk = Residual(6, use_1x1conv=True, strides=2)
blk(X).shape
#TensorShape([4, 3, 3, 6])
#ResNet模型
#ResNet的前兩層跟之前介紹的GoogLeNet中的一樣:在輸出通道數爲64、步幅爲2的7×7卷積層後接步幅爲2的3×3的最大池化層。
# 不同之處在於ResNet每個卷積層後增加的批量歸一化層
net = tf.keras.models.Sequential(
    [layers.Conv2D(64, kernel_size=7, strides=2, padding='same'),
    layers.BatchNormalization(), layers.Activation('relu'),
    layers.MaxPool2D(pool_size=3, strides=2, padding='same')])
#一個模塊的通道數同輸入通道數一致。由於之前已經使用了步幅爲2的最大池化層,所以無須減小高和寬。
# 之後的每個模塊在第一個殘差塊裏將上一個模塊的通道數翻倍,並將高和寬減半。
class ResnetBlock(tf.keras.layers.Layer):
    def __init__(self,num_channels, num_residuals, first_block=False,**kwargs):
        super(ResnetBlock, self).__init__(**kwargs)
        self.listLayers=[]
        for i in range(num_residuals):
            if i == 0 and not first_block:
                self.listLayers.append(Residual(num_channels, use_1x1conv=True, strides=2))
            else:
                self.listLayers.append(Residual(num_channels))

    def call(self, X):
        for layer in self.listLayers.layers:
            X = layer(X)
        return X
#爲ResNet加入所有殘差塊。這裏每個模塊使用兩個殘差塊。
class ResNet(tf.keras.Model):
    def __init__(self,num_blocks,**kwargs):
        super(ResNet, self).__init__(**kwargs)
        self.conv=layers.Conv2D(64, kernel_size=7, strides=2, padding='same')
        self.bn=layers.BatchNormalization()
        self.relu=layers.Activation('relu')
        self.mp=layers.MaxPool2D(pool_size=3, strides=2, padding='same')
        self.resnet_block1=ResnetBlock(64,num_blocks[0], first_block=True)
        self.resnet_block2=ResnetBlock(128,num_blocks[1])
        self.resnet_block3=ResnetBlock(256,num_blocks[2])
        self.resnet_block4=ResnetBlock(512,num_blocks[3])
        self.gap=layers.GlobalAvgPool2D()
        self.fc=layers.Dense(units=10,activation=tf.keras.activations.softmax)

    def call(self, x):
        x=self.conv(x)
        x=self.bn(x)
        x=self.relu(x)
        x=self.mp(x)
        x=self.resnet_block1(x)
        x=self.resnet_block2(x)
        x=self.resnet_block3(x)
        x=self.resnet_block4(x)
        x=self.gap(x)
        x=self.fc(x)
        return x

mynet=ResNet([2,2,2,2])

#這裏每個模塊裏有4個卷積層(不計算 1×1卷積層),加上最開始的卷積層和最後的全連接層,共計18層。
# 這個模型通常也被稱爲ResNet-18。
# 通過配置不同的通道數和模塊裏的殘差塊數可以得到不同的ResNet模型,例如更深的含152層的ResNet-152。
# 雖然ResNet的主體架構跟GoogLeNet的類似,但ResNet結構更簡單,修改也更方便。
# 這些因素都導致了ResNet迅速被廣泛使用。 
# 在訓練ResNet之前,我們來觀察一下輸入形狀在ResNet不同模塊之間的變化。
X = tf.random.uniform(shape=(1,  224, 224 , 1))
for layer in mynet.layers:
    X = layer(X)
    print(layer.name, 'output shape:\t', X.shape)
#獲取數據集,Fashion-MNIST數據集上訓練ResNet
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()
x_train = x_train.reshape((60000, 28, 28, 1)).astype('float32') / 255
x_test = x_test.reshape((10000, 28, 28, 1)).astype('float32') / 255

mynet.compile(loss='sparse_categorical_crossentropy',
              optimizer=tf.keras.optimizers.Adam(),
              metrics=['accuracy'])

history = mynet.fit(x_train, y_train,
                    batch_size=64,
                    epochs=5,
                    validation_split=0.2)
test_scores = mynet.evaluate(x_test, y_test, verbose=2)

小結:

  • 殘差塊通過跨層的數據通道從而能夠訓練出有效的深度神經網絡。
  • ResNet深刻影響了後來的深度神經網絡的設計。

稠密連接網絡(DenseNet)

稠密塊

過渡層

DenseNet網絡

代碼實現如下:

 

import tensorflow as tf

class BottleNeck(tf.keras.layers.Layer):
    def __init__(self, growth_rate, drop_rate):
        super(BottleNeck, self).__init__()
        self.bn1 = tf.keras.layers.BatchNormalization()
        self.conv1 = tf.keras.layers.Conv2D(filters=4 * growth_rate,
                                            kernel_size=(1, 1),
                                            strides=1,
                                            padding="same")
        self.bn2 = tf.keras.layers.BatchNormalization()
        self.conv2 = tf.keras.layers.Conv2D(filters=growth_rate,
                                            kernel_size=(3, 3),
                                            strides=1,
                                            padding="same")
        self.dropout = tf.keras.layers.Dropout(rate=drop_rate)

        self.listLayers = [self.bn1,
                           tf.keras.layers.Activation("relu"),
                           self.conv1,
                           self.bn2,
                           tf.keras.layers.Activation("relu"),
                           self.conv2,
                           self.dropout]

    def call(self, x):
        y = x
        for layer in self.listLayers.layers:
            y = layer(y)
        y = tf.keras.layers.concatenate([x,y], axis=-1)
        return y
class DenseBlock(tf.keras.layers.Layer):
    def __init__(self, num_layers, growth_rate, drop_rate=0.5):
        super(DenseBlock, self).__init__()
        self.num_layers = num_layers
        self.growth_rate = growth_rate
        self.drop_rate = drop_rate
        self.listLayers = []
        for _ in range(num_layers):
            self.listLayers.append(BottleNeck(growth_rate=self.growth_rate, drop_rate=self.drop_rate))

    def call(self, x):
        for layer in self.listLayers.layers:
            x = layer(x)
        return x
#定義一個有2個輸出通道數爲10的卷積塊。使用通道數爲3的輸入時,我們會得到通道數爲3+2×10=23的輸出。
#卷積塊的通道數控制了輸出通道數相對於輸入通道數的增長,因此也被稱爲增長率(growth rate)
blk = DenseBlock(2, 10)
X = tf.random.uniform((4, 8, 8,3))
Y = blk(X)
print(Y.shape)
#過渡層
#由於每個稠密塊都會帶來通道數的增加,使用過多則會帶來過於複雜的模型。過渡層用來控制模型複雜度。
#通過1×1卷積層來減小通道數,並使用步幅爲2的平均池化層減半高和寬,從而進一步降低模型複雜度。
class TransitionLayer(tf.keras.layers.Layer):
    def __init__(self, out_channels):
        super(TransitionLayer, self).__init__()
        self.bn = tf.keras.layers.BatchNormalization()
        self.conv = tf.keras.layers.Conv2D(filters=out_channels,
                                           kernel_size=(1, 1),
                                           strides=1,
                                           padding="same")
        self.pool = tf.keras.layers.MaxPool2D(pool_size=(2, 2),
                                              strides=2,
                                              padding="same")

    def call(self, inputs):
        x = self.bn(inputs)
        x = tf.keras.activations.relu(x)
        x = self.conv(x)
        x = self.pool(x)
        return x
#降低通道數爲10
blk = TransitionLayer(10)
print(blk(Y).shape)
#結果:TensorShape([4, 4, 4, 10])

#DenseNet使用的是4個稠密塊,我們可以設置每個稠密塊使用多少個卷積層。
# 這裏我們設成4,從而與上一節的ResNet-18保持一致。稠密塊裏的卷積層通道數(即增長率)設爲32,所以每個稠密塊將增加128個通道。
class DenseNet(tf.keras.Model):
    def __init__(self, num_init_features, growth_rate, block_layers, compression_rate, drop_rate):
        super(DenseNet, self).__init__()
        self.conv = tf.keras.layers.Conv2D(filters=num_init_features,
                                           kernel_size=(7, 7),
                                           strides=2,
                                           padding="same")
        self.bn = tf.keras.layers.BatchNormalization()
        self.pool = tf.keras.layers.MaxPool2D(pool_size=(3, 3),
                                              strides=2,
                                              padding="same")
        self.num_channels = num_init_features
        self.dense_block_1 = DenseBlock(num_layers=block_layers[0], growth_rate=growth_rate, drop_rate=drop_rate)
        self.num_channels += growth_rate * block_layers[0]
        self.num_channels = compression_rate * self.num_channels
        self.transition_1 = TransitionLayer(out_channels=int(self.num_channels))
        self.dense_block_2 = DenseBlock(num_layers=block_layers[1], growth_rate=growth_rate, drop_rate=drop_rate)
        self.num_channels += growth_rate * block_layers[1]
        self.num_channels = compression_rate * self.num_channels
        self.transition_2 = TransitionLayer(out_channels=int(self.num_channels))
        self.dense_block_3 = DenseBlock(num_layers=block_layers[2], growth_rate=growth_rate, drop_rate=drop_rate)
        self.num_channels += growth_rate * block_layers[2]
        self.num_channels = compression_rate * self.num_channels
        self.transition_3 = TransitionLayer(out_channels=int(self.num_channels))
        self.dense_block_4 = DenseBlock(num_layers=block_layers[3], growth_rate=growth_rate, drop_rate=drop_rate)

        self.avgpool = tf.keras.layers.GlobalAveragePooling2D()
        self.fc = tf.keras.layers.Dense(units=10,
                                        activation=tf.keras.activations.softmax)

    def call(self, inputs):
        x = self.conv(inputs)
        x = self.bn(x)
        x = tf.keras.activations.relu(x)
        x = self.pool(x)

        x = self.dense_block_1(x)
        x = self.transition_1(x)
        x = self.dense_block_2(x)
        x = self.transition_2(x)
        x = self.dense_block_3(x)
        x = self.transition_3(x,)
        x = self.dense_block_4(x)

        x = self.avgpool(x)
        x = self.fc(x)

        return x
def densenet():
    return DenseNet(num_init_features=64, growth_rate=32, block_layers=[4,4,4,4], compression_rate=0.5, drop_rate=0.5)
mynet=densenet()

X = tf.random.uniform(shape=(1,  96, 96 , 1))
for layer in mynet.layers:
    X = layer(X)
    print(layer.name, 'output shape:\t', X.shape)
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()
x_train = x_train.reshape((60000, 28, 28, 1)).astype('float32') / 255
x_test = x_test.reshape((10000, 28, 28, 1)).astype('float32') / 255

mynet.compile(loss='sparse_categorical_crossentropy',
              optimizer=tf.keras.optimizers.Adam(),
              metrics=['accuracy'])

history = mynet.fit(x_train, y_train,
                    batch_size=64,
                    epochs=5,
                    validation_split=0.2)
test_scores = mynet.evaluate(x_test, y_test, verbose=2)
mynet.save_weights("DenseNet.h5")

小結:

  • 在跨層連接上,不同於ResNet中將輸入與輸出相加,DenseNet在通道維上連結輸入與輸出。
  • DenseNet的主要構建模塊是稠密塊和過渡層。

總結:累。。。大家都跟着教程的代碼敲一敲可能更容易理解。用沫神的話說就是,這些結構就試出來,勝者爲王,準確率高的就有其道理存在。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章