tensorflow2.0筆記16:ResNet介紹以及Tensorflow2.0中ResNet-18的實戰!

ResNet介紹以及Tensorflow2.0中ResNet-18的實戰!

一、ResNet介紹

1.1、ResNet-34的基本結構

  • 最左邊爲VGG19,也就是19層,這裏畫法非常有講究的,比如左中特意留了一個空白,就意味着我們的34層通過加了一個short cut之後,至少至少也能退化到一個直連接就是VGG19
  • 用的最多的比如34層,56層,152層了。

1.2、爲什麼叫殘差?

  • 從下圖可以看出,數據經過了兩條路線,一條是常規路線,另一條則是捷徑(shortcut),直接實現單位映射的直接連接的路線,這有點類似與電路中的“短路”。通過實驗,這種帶有shortcut的結構確實可以很好地應對退化問題。我們把網絡中的一個模塊的輸入和輸出關係看作是y=H(x),那麼直接通過梯度方法求H(x)就會遇到上面提到的退化問題,如果使用了這種帶shortcut的結構,那麼可變參數部分的優化目標就不再是H(x),若用F(x)來代表需要優化的部分的話,則H(x)=F(x)+x,也就是F(x)=H(x)-x。因爲在單位映射的假設中y=x就相當於觀測值,所以F(x)就對應着殘差,因而叫殘差網絡。爲啥要這樣做,因爲作者認爲學習殘差F(X)比直接學習H(X)簡單!設想下,現在根據我們只需要去學習輸入和輸出的差值就可以了,絕對量變爲相對量(H(x)-x 就是輸出相對於輸入變化了多少),優化起來簡單很多。
  • 考慮到x的維度與F(X)維度可能不匹配情況,需進行維度匹配。這裏論文中採用兩種方法解決這一問題(其實是三種,但通過實驗發現第三種方法會使performance急劇下降,故不採用):
  1. zero_padding:對恆等層進行0填充的方式將維度補充完整。這種方法不會增加額外的參數
  2. projection:在恆等層採用1x1的卷積核來增加維度。這種方法會增加額外的參數
  • 下圖展示了兩種形態的殘差模塊,左圖是常規殘差模塊,有兩個3×3卷積核卷積核組成,但是隨着網絡進一步加深,這種殘差結構在實踐中並不是十分有效。針對這問題,右圖的“瓶頸殘差模塊”(bottleneck residual block)可以有更好的效果,它依次由1×1、3×3、1×1這三個卷積層堆積而成,這裏的1×1的卷積能夠起降維或升維的作用,從而令3×3的卷積可以在相對較低維度的輸入上進行,以達到提高計算效率的目的。

1.2、tensorflow中如何實現基本的殘差塊

1.3、tensorflow中如何實現基本的殘差塊

  • 上面只是介紹了一個Basic Block,在ResNet裏面,基本的單元並不是一個Basic Block。它是由多個Basic Block堆疊而成,堆疊成一整塊叫做Res Block。
  • 創建Res Block
  • ResNet-18是如何形成的呢?

二、ResNet實戰

2.1、回顧Basic Block

2.2、實現Basic Block

import tensorflow as tf
from tensorflow.python.keras import layers, Sequential
import tensorflow.keras as keras
import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

class BasicBlock(layers.Layer):
    def __init__(self, filter_num, stride=1):
        super(BasicBlock, self).__init__()

        self.conv1 = layers.Conv2D(filter_num, kernel_size=[3, 3], strides=stride, padding='same')
        self.bn1 = layers.BatchNormalization()
        self.relu = layers.Activation('relu')

        #上一塊如果做Stride就會有一個下采樣,在這個裏面就不做下采樣了。這一塊始終保持size一致,把stride固定爲1
        self.conv2 = layers.Conv2D(filter_num, kernel_size=[3, 3], stride=1, padding='same')
        self.bn2 = layers.BatchNormalization()

        if stride != 1:
            self.downsample = Sequential()
            self.downsample.add(layers.Conv2D(filter_num, kernel_size=[1, 1], strides=stride)) #保持stride相同
        else:
            self.downsample = lambda x:x

    def call(self, inputs, training=None):

        # [b, h, w, c]
        out = self.con1(inputs)  #首先調用:__call()__ =>  call()
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        identity = self.downsample(inputs)

        output = layers.add([out, identity])  #layers下面有一個add,把這2個層添加進來相加。
        output = tf.nn.relu(output)

2.3、實現Res Block(多個Basic Block堆疊一起組成)

  • 我們實現了Basic Block之後,但是Basic Block並不是resnet的基本單元,它的基本單元叫做Res Block,也就是Res Block是由多個這樣的Basic Block堆疊一起組成的。
  • 實現代碼如下:
# Res Block 模塊。繼承keras.Model或者keras.Layer都可以
class ResNet(keras.Model):

    # 第一個參數layer_dims:[2, 2, 2, 2] 4個Res Block,每個包含2個Basic Block
    # 第二個參數num_classes:我們的全連接輸出,取決於輸出有多少類。
    def __init__(self, layer_dims, num_classes):
        super(ResNet, self).__init__()

        # 預處理層;實現起來比較靈活可以加 MAXPool2D,可以沒有。
        self.stem = Sequential([layers.Conv2D(64, (3,3), strides=(1, 1)),
                                layers.BatchNormalization(),
                                layers.Activation('relu'),
                                layers.MaxPool2D(pool_size=(2, 2), strides=(1, 1), padding='same')])

        # 創建4個Res Block;注意第1項不一定以2倍形式擴張,都是比較隨意的,這裏都是經驗值。
        self.layer1 = self.build_resblock(64, layer_dims[0])
        self.layer2 = self.build_resblock(128, layer_dims[1], stride=2)
        self.layer3 = self.build_resblock(256, layer_dims[2], stride=2)
        self.layer4 = self.build_resblock(512, layer_dims[3], stride=2)

        # 殘差網絡輸出output: [b, 512, h, w];長寬無法確定,上面的需要運算一下,如果這裏沒有辦法確定的話。
        # 用這個層可以自適應的確定輸出。表示不管你的長和寬是多少,我會在某一個channel上面,所有的長和寬像素值加起來
        # 求一個均值,比如:有512個3*3的feature map,[512, 3, 3],每個feature map爲3*3,9個像素值,我做一個這樣的
        # average,得到一個平均的像素值是多少。下面這裏處理之後得到一個512的vector,準確來說爲[512, 1, 1],這個512的
        # vector就可以送到先形成進行分類。
        self.avgpool = layers.GlobalAveragePooling2D
        # 全連接層:爲了分類
        self.fc = layers.Dense(num_classes)


    def call(self,inputs, training=None):
        # __init__中準備工作完畢;下面完成前向運算過程。
        x = self.stem(inputs)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        # 做一個global average pooling,得到之後只會得到一個channel,不需要做reshape操作了。
        # shape爲 [batchsize, channel]
        x = self.avgpool(x)
        # [b, 100]
        x = self.fc(x)
        return x


    # 實現 Res Block; 創建一個Res Block
    def build_resblock(self, filter_num, block, stride=1):
        res_blocks = Sequential()
        # may down sample 也許進行下采樣。
        # 對於當前Res Block中的Basic Block,我們要求每個Res Block只有一次下采樣的能力。
        res_blocks.add(BasicBlock(filter_num, stride))

        for _ in range(1, blocks):
            res_blocks.add(BasicBlock(filter_num, stride=1,)) # 這裏stride設置爲1,只會在第一個Basic Block做一個下采樣。

        return res_blocks

2.4、ResNet-18中18的由來以及最終演示結果

補充:下面需要使用的。

  • 介紹一下global average pooling ,這個概念出自於 network in network;global average pooling 與 average pooling 的差別就在 “global” 這一個字眼上。global 與 local 在字面上都是用來形容 pooling 窗口區域的。 local 是取 feature map 的一個子區域求平均值,然後滑動這個子區域; global 顯然就是對整個 feature map 求平均值了。
  • 主要是用來解決全連接的問題,其主要是是將最後一層的特徵圖進行整張圖的一個均值池化,形成一個特徵點,將這些特徵點組成最後的特徵向量進行softmax中進行計算
  • 舉個例子:假如,最後的一層的數據是10個6×6的特徵圖,global average pooling是將每一張特徵圖計算所有像素點的均值,輸出一個數據值。這樣10 個特徵圖就會輸出10個數據點,將這些數據點組成一個1×10的向量的話,就成爲一個特徵向量,就可以送入到softmax的分類中計算了;圖中是:對比全連接與全局均值池化的差異
  • 代碼模塊1:resnet.py文件:
import tensorflow as tf
from tensorflow.keras import layers, Sequential
import tensorflow.keras as keras


# Basic Block 模塊。
class BasicBlock(layers.Layer):
    def __init__(self, filter_num, stride=1):
        super(BasicBlock, self).__init__()

        self.conv1 = layers.Conv2D(filter_num, (3, 3), strides=stride, padding='same')
        self.bn1 = layers.BatchNormalization()
        self.relu = layers.Activation('relu')

        #上一塊如果做Stride就會有一個下采樣,在這個裏面就不做下采樣了。這一塊始終保持size一致,把stride固定爲1
        self.conv2 = layers.Conv2D(filter_num, (3, 3), strides=1, padding='same')
        self.bn2 = layers.BatchNormalization()

        if stride != 1:
            self.downsample = Sequential()
            self.downsample.add(layers.Conv2D(filter_num, (1, 1), strides=stride))
        else:
            self.downsample = lambda x:x

    def call(self, inputs, training=None):

        # [b, h, w, c]
        out = self.conv1(inputs)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        identity = self.downsample(inputs)

        output = layers.add([out, identity])  #layers下面有一個add,把這2個層添加進來相加。
        output = tf.nn.relu(output)
        return output


# Res Block 模塊。繼承keras.Model或者keras.Layer都可以
class ResNet(keras.Model):

    # 第一個參數layer_dims:[2, 2, 2, 2] 4個Res Block,每個包含2個Basic Block
    # 第二個參數num_classes:我們的全連接輸出,取決於輸出有多少類。
    def __init__(self, layer_dims, num_classes=100):
        super(ResNet, self).__init__()

        # 預處理層;實現起來比較靈活可以加 MAXPool2D,可以沒有。
        self.stem = Sequential([layers.Conv2D(64, (3, 3), strides=(1, 1)),
                                layers.BatchNormalization(),
                                layers.Activation('relu'),
                                layers.MaxPool2D(pool_size=(2, 2), strides=(1, 1), padding='same')
                                ])

        # 創建4個Res Block;注意第1項不一定以2倍形式擴張,都是比較隨意的,這裏都是經驗值。
        self.layer1 = self.build_resblock(64, layer_dims[0])
        self.layer2 = self.build_resblock(128, layer_dims[1], stride=2)
        self.layer3 = self.build_resblock(256, layer_dims[2], stride=2)
        self.layer4 = self.build_resblock(512, layer_dims[3], stride=2)

        self.avgpool = layers.GlobalAveragePooling2D()
        self.fc = layers.Dense(num_classes)


    def call(self,inputs, training=None):
        # __init__中準備工作完畢;下面完成前向運算過程。
        x = self.stem(inputs)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        # 做一個global average pooling,得到之後只會得到一個channel,不需要做reshape操作了。
        # shape爲 [batchsize, channel]
        x = self.avgpool(x)
        # [b, 100]
        x = self.fc(x)


        return x


    # 實現 Res Block; 創建一個Res Block
    def build_resblock(self, filter_num, blocks, stride=1):

        res_blocks = Sequential()
        # may down sample 也許進行下采樣。
        # 對於當前Res Block中的Basic Block,我們要求每個Res Block只有一次下采樣的能力。
        res_blocks.add(BasicBlock(filter_num, stride))

        for _ in range(1, blocks):
            res_blocks.add(BasicBlock(filter_num, stride=1)) # 這裏stride設置爲1,只會在第一個Basic Block做一個下采樣。

        return res_blocks


def resnet18():

    return ResNet([2, 2, 2, 2])


# 如果我們要使用 ResNet-34 的話,那34是怎樣的配置呢?只需要改一下這裏就可以了。對於56,152去查一下配置
def resnet34():

    return ResNet([3, 4, 6, 3]) #4個Res Block,第1個包含3個Basic Block,第2爲4,第3爲6,第4爲3
    
  • 代碼模塊2:resnet18_train.py文件:
import tensorflow as tf
from tensorflow.keras import layers, optimizers, datasets, Sequential
from resnet import resnet18

import os

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
tf.random.set_seed(2345)


# 數據預處理,僅僅是類型的轉換。    [-1~1]
def preprocess(x, y):
    x = 2 * tf.cast(x, dtype=tf.float32) / 255. - 0.5
    y = tf.cast(y, dtype=tf.int32)
    return x, y


# 數據集的加載
(x, y), (x_test, y_test) = datasets.cifar100.load_data()
y = tf.squeeze(y)  # 或者tf.squeeze(y, axis=1)把1維度的squeeze掉。
y_test = tf.squeeze(y_test)  # 或者tf.squeeze(y, axis=1)把1維度的squeeze掉。
print(x.shape, y.shape, x_test.shape, y_test.shape)

train_db = tf.data.Dataset.from_tensor_slices((x, y))
train_db = train_db.shuffle(1000).map(preprocess).batch(512)

test_db = tf.data.Dataset.from_tensor_slices((x_test, y_test))
test_db = test_db.map(preprocess).batch(512)

# 我們來測試一下sample的形狀。
sample = next(iter(train_db))
print('sample:', sample[0].shape, sample[1].shape,
      tf.reduce_min(sample[0]), tf.reduce_max(sample[0]))  # 值範圍爲[0,1]


def main():
    # 輸入:[b, 32, 32, 3]
    model = resnet18()
    model.build(input_shape=(None, 32, 32, 3))
    model.summary()
    optimizer = optimizers.Adam(lr=1e-3)


    for epoch in range(500):

        for step, (x, y) in enumerate(train_db):
            with tf.GradientTape() as tape:
                # [b, 32, 32, 3] => [b, 100]
                logits = model(x)
                # [b] => [b, 100]
                y_onehot = tf.one_hot(y, depth=100)
                # compute loss   結果維度[b]
                loss = tf.losses.categorical_crossentropy(y_onehot, logits, from_logits=True)
                loss = tf.reduce_mean(loss)

            # 梯度求解
            grads = tape.gradient(loss, model.trainable_variables)
            # 梯度更新
            optimizer.apply_gradients(zip(grads, model.trainable_variables))

            if step % 50 == 0:
                print(epoch, step, 'loss:', float(loss))

        # 做測試
        total_num = 0
        total_correct = 0
        for x, y in test_db:

            logits = model(x)
            # 預測可能性。
            prob = tf.nn.softmax(logits, axis=1)
            pred = tf.argmax(prob, axis=1)  # 還記得嗎pred類型爲int64,需要轉換一下。
            pred = tf.cast(pred, dtype=tf.int32)

            # 拿到預測值pred和真實值比較。
            correct = tf.cast(tf.equal(pred, y), dtype=tf.int32)
            correct = tf.reduce_sum(correct)

            total_num += x.shape[0]
            total_correct += int(correct)  # 轉換爲numpy數據

        acc = total_correct / total_num
        print(epoch, 'acc:', acc)


if __name__ == '__main__':
    main()

  • 運行結果顯示:
(50000, 32, 32, 3) (50000,) (10000, 32, 32, 3) (10000,)
sample: (512, 32, 32, 3) (512,) tf.Tensor(-0.5, shape=(), dtype=float32) tf.Tensor(0.5, shape=(), dtype=float32)
Model: "res_net"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
sequential (Sequential)      multiple                  2048      
_________________________________________________________________
sequential_1 (Sequential)    multiple                  148736    
_________________________________________________________________
sequential_2 (Sequential)    multiple                  526976    
_________________________________________________________________
sequential_4 (Sequential)    multiple                  2102528   
_________________________________________________________________
sequential_6 (Sequential)    multiple                  8399360   
_________________________________________________________________
global_average_pooling2d (Gl multiple                  0         
_________________________________________________________________
dense (Dense)                multiple                  51300     
=================================================================
Total params: 11,230,948
Trainable params: 11,223,140
Non-trainable params: 7,808
_________________________________________________________________
0 0 loss: 4.606736183166504
0 50 loss: 4.416693687438965
0 acc: 0.0604
1 0 loss: 4.007843971252441
1 50 loss: 3.776982307434082
1 acc: 0.1154
2 0 loss: 3.6374154090881348
2 50 loss: 3.4091014862060547
2 acc: 0.1768
3 0 loss: 3.335117816925049
3 50 loss: 3.0340826511383057
3 acc: 0.2271
4 0 loss: 3.0630342960357666
4 50 loss: 2.767192840576172
4 acc: 0.2788
5 0 loss: 2.801095485687256
5 50 loss: 2.5093324184417725
5 acc: 0.2863
6 0 loss: 2.652071237564087
6 50 loss: 2.3743672370910645
6 acc: 0.3103
7 0 loss: 2.3989481925964355
7 50 loss: 2.2451577186584473
7 acc: 0.317
8 0 loss: 2.3536462783813477
8 50 loss: 2.095005989074707
8 acc: 0.3192
9 0 loss: 2.145143985748291
9 50 loss: 1.9432967901229858
9 acc: 0.3216
10 0 loss: 2.055953025817871
10 50 loss: 1.8490103483200073
10 acc: 0.3235
11 0 loss: 1.845646858215332
11 50 loss: 1.5962769985198975
11 acc: 0.342
12 0 loss: 1.6497595310211182
12 50 loss: 1.5217297077178955
12 acc: 0.332
13 0 loss: 1.470338225364685
13 50 loss: 1.4912822246551514
13 acc: 0.3124
14 0 loss: 1.3743737936019897
14 50 loss: 1.2206969261169434
14 acc: 0.3074
15 0 loss: 1.3610031604766846
15 50 loss: 0.9420070052146912
15 acc: 0.3254
16 0 loss: 1.078605055809021
16 50 loss: 1.003871202468872
16 acc: 0.3174
17 0 loss: 1.0461890697479248
17 50 loss: 0.8586055040359497
17 acc: 0.3215
18 0 loss: 0.8623021841049194
18 50 loss: 0.6324957609176636
18 acc: 0.3169
19 0 loss: 0.9003666639328003
19 50 loss: 0.6545089483261108
19 acc: 0.3014
20 0 loss: 0.7230895757675171
20 50 loss: 0.41668233275413513
20 acc: 0.3162
21 0 loss: 0.4999226927757263
21 50 loss: 0.4038138687610626
21 acc: 0.3192
22 0 loss: 0.5035152435302734
22 50 loss: 0.36830756068229675
22 acc: 0.3115
23 0 loss: 0.5791099071502686
23 50 loss: 0.4304996728897095
23 acc: 0.3208
24 0 loss: 0.38201427459716797
24 50 loss: 0.23830433189868927
24 acc: 0.3356
25 0 loss: 0.21569305658340454
25 50 loss: 0.2295464128255844
25 acc: 0.3327
26 0 loss: 0.1231858879327774
26 50 loss: 0.20612354576587677
26 acc: 0.3323
27 0 loss: 0.1556326150894165
27 50 loss: 0.15461283922195435
27 acc: 0.3345
28 0 loss: 0.09280207753181458
28 50 loss: 0.05414274334907532
28 acc: 0.334
29 0 loss: 0.05890154093503952
29 50 loss: 0.08330313116312027
29 acc: 0.3374
30 0 loss: 0.06374034285545349
30 50 loss: 0.0645279586315155
30 acc: 0.3507
31 0 loss: 0.06771121919155121
31 50 loss: 0.03828241676092148
31 acc: 0.3435
32 0 loss: 0.05325049161911011
32 50 loss: 0.06898440420627594
32 acc: 0.3472
33 0 loss: 0.052143510431051254
33 50 loss: 0.07428835332393646
33 acc: 0.3515
34 0 loss: 0.05063686892390251
34 50 loss: 0.041026901453733444
34 acc: 0.3461
35 0 loss: 0.09660334885120392
35 50 loss: 0.10083606839179993
35 acc: 0.3467
36 0 loss: 0.0585043728351593
36 50 loss: 0.04725605621933937
36 acc: 0.3479
37 0 loss: 0.05428542569279671
37 50 loss: 0.0645551085472107
37 acc: 0.3429
38 0 loss: 0.04979332536458969
38 50 loss: 0.028766361996531487
38 acc: 0.3448
39 0 loss: 0.06059214845299721
39 50 loss: 0.03867074102163315
39 acc: 0.352
40 0 loss: 0.04751269519329071
40 50 loss: 0.05410218983888626
40 acc: 0.3406
41 0 loss: 0.07864020764827728
41 50 loss: 0.06852877885103226
41 acc: 0.3527
42 0 loss: 0.04342082887887955
42 50 loss: 0.0316157229244709
42 acc: 0.3542
43 0 loss: 0.08915773034095764
43 50 loss: 0.061082299798727036
43 acc: 0.3551
44 0 loss: 0.06201590225100517
44 50 loss: 0.07863974571228027
44 acc: 0.3527
45 0 loss: 0.06855347752571106
45 50 loss: 0.06905807554721832
45 acc: 0.3551
46 0 loss: 0.046435438096523285
46 50 loss: 0.06059195101261139
46 acc: 0.3474
47 0 loss: 0.03513294830918312
47 50 loss: 0.048817235976457596
47 acc: 0.3509
48 0 loss: 0.04353480041027069
48 50 loss: 0.03148560971021652
48 acc: 0.3473
49 0 loss: 0.05442756786942482
49 50 loss: 0.03871474415063858
49 acc: 0.3467

  • 500個epoch測試結果如下:

注意: 對於ResNet-152的參數配置可以參考:ResNet-152

2.5、Out of memory情況解決辦法

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章