ResNet介紹以及Tensorflow2.0中ResNet-18的實戰! |
文章目錄
一、ResNet介紹
1.1、ResNet-34的基本結構
- 最左邊爲VGG19,也就是19層,這裏畫法非常有講究的,比如左中特意留了一個空白,就意味着我們的34層通過加了一個short cut之後,至少至少也能退化到一個直連接就是VGG19
- 用的最多的比如34層,56層,152層了。
1.2、爲什麼叫殘差?
- 從下圖可以看出,數據經過了兩條路線,一條是常規路線,另一條則是捷徑(shortcut),直接實現單位映射的直接連接的路線,這有點類似與電路中的“短路”。通過實驗,這種帶有shortcut的結構確實可以很好地應對退化問題。我們把網絡中的一個模塊的輸入和輸出關係看作是y=H(x),那麼直接通過梯度方法求H(x)就會遇到上面提到的退化問題,如果使用了這種帶shortcut的結構,那麼可變參數部分的優化目標就不再是H(x),若用F(x)來代表需要優化的部分的話,則H(x)=F(x)+x,也就是F(x)=H(x)-x。因爲在單位映射的假設中y=x就相當於觀測值,所以F(x)就對應着殘差,因而叫殘差網絡。爲啥要這樣做,因爲作者認爲學習殘差F(X)比直接學習H(X)簡單!設想下,現在根據我們只需要去學習輸入和輸出的差值就可以了,絕對量變爲相對量(H(x)-x 就是輸出相對於輸入變化了多少),優化起來簡單很多。
- 考慮到x的維度與F(X)維度可能不匹配情況,需進行維度匹配。這裏論文中採用兩種方法解決這一問題(其實是三種,但通過實驗發現第三種方法會使performance急劇下降,故不採用):
- zero_padding:對恆等層進行0填充的方式將維度補充完整。這種方法不會增加額外的參數
- projection:在恆等層採用1x1的卷積核來增加維度。這種方法會增加額外的參數
- 下圖展示了兩種形態的殘差模塊,左圖是常規殘差模塊,有兩個3×3卷積核卷積核組成,但是隨着網絡進一步加深,這種殘差結構在實踐中並不是十分有效。針對這問題,右圖的“瓶頸殘差模塊”(bottleneck residual block)可以有更好的效果,它依次由1×1、3×3、1×1這三個卷積層堆積而成,這裏的1×1的卷積能夠起降維或升維的作用,從而令3×3的卷積可以在相對較低維度的輸入上進行,以達到提高計算效率的目的。
1.2、tensorflow中如何實現基本的殘差塊
1.3、tensorflow中如何實現基本的殘差塊
- 上面只是介紹了一個Basic Block,在ResNet裏面,基本的單元並不是一個Basic Block。它是由多個Basic Block堆疊而成,堆疊成一整塊叫做Res Block。
- 創建Res Block
- ResNet-18是如何形成的呢?
二、ResNet實戰
2.1、回顧Basic Block
2.2、實現Basic Block
import tensorflow as tf
from tensorflow.python.keras import layers, Sequential
import tensorflow.keras as keras
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
class BasicBlock(layers.Layer):
def __init__(self, filter_num, stride=1):
super(BasicBlock, self).__init__()
self.conv1 = layers.Conv2D(filter_num, kernel_size=[3, 3], strides=stride, padding='same')
self.bn1 = layers.BatchNormalization()
self.relu = layers.Activation('relu')
#上一塊如果做Stride就會有一個下采樣,在這個裏面就不做下采樣了。這一塊始終保持size一致,把stride固定爲1
self.conv2 = layers.Conv2D(filter_num, kernel_size=[3, 3], stride=1, padding='same')
self.bn2 = layers.BatchNormalization()
if stride != 1:
self.downsample = Sequential()
self.downsample.add(layers.Conv2D(filter_num, kernel_size=[1, 1], strides=stride)) #保持stride相同
else:
self.downsample = lambda x:x
def call(self, inputs, training=None):
# [b, h, w, c]
out = self.con1(inputs) #首先調用:__call()__ => call()
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
identity = self.downsample(inputs)
output = layers.add([out, identity]) #layers下面有一個add,把這2個層添加進來相加。
output = tf.nn.relu(output)
2.3、實現Res Block(多個Basic Block堆疊一起組成)
- 我們實現了Basic Block之後,但是Basic Block並不是resnet的基本單元,它的基本單元叫做Res Block,也就是Res Block是由多個這樣的Basic Block堆疊一起組成的。
- 實現代碼如下:
# Res Block 模塊。繼承keras.Model或者keras.Layer都可以
class ResNet(keras.Model):
# 第一個參數layer_dims:[2, 2, 2, 2] 4個Res Block,每個包含2個Basic Block
# 第二個參數num_classes:我們的全連接輸出,取決於輸出有多少類。
def __init__(self, layer_dims, num_classes):
super(ResNet, self).__init__()
# 預處理層;實現起來比較靈活可以加 MAXPool2D,可以沒有。
self.stem = Sequential([layers.Conv2D(64, (3,3), strides=(1, 1)),
layers.BatchNormalization(),
layers.Activation('relu'),
layers.MaxPool2D(pool_size=(2, 2), strides=(1, 1), padding='same')])
# 創建4個Res Block;注意第1項不一定以2倍形式擴張,都是比較隨意的,這裏都是經驗值。
self.layer1 = self.build_resblock(64, layer_dims[0])
self.layer2 = self.build_resblock(128, layer_dims[1], stride=2)
self.layer3 = self.build_resblock(256, layer_dims[2], stride=2)
self.layer4 = self.build_resblock(512, layer_dims[3], stride=2)
# 殘差網絡輸出output: [b, 512, h, w];長寬無法確定,上面的需要運算一下,如果這裏沒有辦法確定的話。
# 用這個層可以自適應的確定輸出。表示不管你的長和寬是多少,我會在某一個channel上面,所有的長和寬像素值加起來
# 求一個均值,比如:有512個3*3的feature map,[512, 3, 3],每個feature map爲3*3,9個像素值,我做一個這樣的
# average,得到一個平均的像素值是多少。下面這裏處理之後得到一個512的vector,準確來說爲[512, 1, 1],這個512的
# vector就可以送到先形成進行分類。
self.avgpool = layers.GlobalAveragePooling2D
# 全連接層:爲了分類
self.fc = layers.Dense(num_classes)
def call(self,inputs, training=None):
# __init__中準備工作完畢;下面完成前向運算過程。
x = self.stem(inputs)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
# 做一個global average pooling,得到之後只會得到一個channel,不需要做reshape操作了。
# shape爲 [batchsize, channel]
x = self.avgpool(x)
# [b, 100]
x = self.fc(x)
return x
# 實現 Res Block; 創建一個Res Block
def build_resblock(self, filter_num, block, stride=1):
res_blocks = Sequential()
# may down sample 也許進行下采樣。
# 對於當前Res Block中的Basic Block,我們要求每個Res Block只有一次下采樣的能力。
res_blocks.add(BasicBlock(filter_num, stride))
for _ in range(1, blocks):
res_blocks.add(BasicBlock(filter_num, stride=1,)) # 這裏stride設置爲1,只會在第一個Basic Block做一個下采樣。
return res_blocks
2.4、ResNet-18中18的由來以及最終演示結果
補充:下面需要使用的。
- 介紹一下global average pooling ,這個概念出自於 network in network;global average pooling 與 average pooling 的差別就在 “global” 這一個字眼上。global 與 local 在字面上都是用來形容 pooling 窗口區域的。 local 是取 feature map 的一個子區域求平均值,然後滑動這個子區域; global 顯然就是對整個 feature map 求平均值了。
- 主要是用來解決全連接的問題,其主要是是將最後一層的特徵圖進行整張圖的一個均值池化,形成一個特徵點,將這些特徵點組成最後的特徵向量進行softmax中進行計算
- 舉個例子:假如,最後的一層的數據是10個6×6的特徵圖,global average pooling是將每一張特徵圖計算所有像素點的均值,輸出一個數據值。這樣10 個特徵圖就會輸出10個數據點,將這些數據點組成一個1×10的向量的話,就成爲一個特徵向量,就可以送入到softmax的分類中計算了;圖中是:對比全連接與全局均值池化的差異
- 代碼模塊1:resnet.py文件:
import tensorflow as tf
from tensorflow.keras import layers, Sequential
import tensorflow.keras as keras
# Basic Block 模塊。
class BasicBlock(layers.Layer):
def __init__(self, filter_num, stride=1):
super(BasicBlock, self).__init__()
self.conv1 = layers.Conv2D(filter_num, (3, 3), strides=stride, padding='same')
self.bn1 = layers.BatchNormalization()
self.relu = layers.Activation('relu')
#上一塊如果做Stride就會有一個下采樣,在這個裏面就不做下采樣了。這一塊始終保持size一致,把stride固定爲1
self.conv2 = layers.Conv2D(filter_num, (3, 3), strides=1, padding='same')
self.bn2 = layers.BatchNormalization()
if stride != 1:
self.downsample = Sequential()
self.downsample.add(layers.Conv2D(filter_num, (1, 1), strides=stride))
else:
self.downsample = lambda x:x
def call(self, inputs, training=None):
# [b, h, w, c]
out = self.conv1(inputs)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
identity = self.downsample(inputs)
output = layers.add([out, identity]) #layers下面有一個add,把這2個層添加進來相加。
output = tf.nn.relu(output)
return output
# Res Block 模塊。繼承keras.Model或者keras.Layer都可以
class ResNet(keras.Model):
# 第一個參數layer_dims:[2, 2, 2, 2] 4個Res Block,每個包含2個Basic Block
# 第二個參數num_classes:我們的全連接輸出,取決於輸出有多少類。
def __init__(self, layer_dims, num_classes=100):
super(ResNet, self).__init__()
# 預處理層;實現起來比較靈活可以加 MAXPool2D,可以沒有。
self.stem = Sequential([layers.Conv2D(64, (3, 3), strides=(1, 1)),
layers.BatchNormalization(),
layers.Activation('relu'),
layers.MaxPool2D(pool_size=(2, 2), strides=(1, 1), padding='same')
])
# 創建4個Res Block;注意第1項不一定以2倍形式擴張,都是比較隨意的,這裏都是經驗值。
self.layer1 = self.build_resblock(64, layer_dims[0])
self.layer2 = self.build_resblock(128, layer_dims[1], stride=2)
self.layer3 = self.build_resblock(256, layer_dims[2], stride=2)
self.layer4 = self.build_resblock(512, layer_dims[3], stride=2)
self.avgpool = layers.GlobalAveragePooling2D()
self.fc = layers.Dense(num_classes)
def call(self,inputs, training=None):
# __init__中準備工作完畢;下面完成前向運算過程。
x = self.stem(inputs)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
# 做一個global average pooling,得到之後只會得到一個channel,不需要做reshape操作了。
# shape爲 [batchsize, channel]
x = self.avgpool(x)
# [b, 100]
x = self.fc(x)
return x
# 實現 Res Block; 創建一個Res Block
def build_resblock(self, filter_num, blocks, stride=1):
res_blocks = Sequential()
# may down sample 也許進行下采樣。
# 對於當前Res Block中的Basic Block,我們要求每個Res Block只有一次下采樣的能力。
res_blocks.add(BasicBlock(filter_num, stride))
for _ in range(1, blocks):
res_blocks.add(BasicBlock(filter_num, stride=1)) # 這裏stride設置爲1,只會在第一個Basic Block做一個下采樣。
return res_blocks
def resnet18():
return ResNet([2, 2, 2, 2])
# 如果我們要使用 ResNet-34 的話,那34是怎樣的配置呢?只需要改一下這裏就可以了。對於56,152去查一下配置
def resnet34():
return ResNet([3, 4, 6, 3]) #4個Res Block,第1個包含3個Basic Block,第2爲4,第3爲6,第4爲3
- 代碼模塊2:resnet18_train.py文件:
import tensorflow as tf
from tensorflow.keras import layers, optimizers, datasets, Sequential
from resnet import resnet18
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
tf.random.set_seed(2345)
# 數據預處理,僅僅是類型的轉換。 [-1~1]
def preprocess(x, y):
x = 2 * tf.cast(x, dtype=tf.float32) / 255. - 0.5
y = tf.cast(y, dtype=tf.int32)
return x, y
# 數據集的加載
(x, y), (x_test, y_test) = datasets.cifar100.load_data()
y = tf.squeeze(y) # 或者tf.squeeze(y, axis=1)把1維度的squeeze掉。
y_test = tf.squeeze(y_test) # 或者tf.squeeze(y, axis=1)把1維度的squeeze掉。
print(x.shape, y.shape, x_test.shape, y_test.shape)
train_db = tf.data.Dataset.from_tensor_slices((x, y))
train_db = train_db.shuffle(1000).map(preprocess).batch(512)
test_db = tf.data.Dataset.from_tensor_slices((x_test, y_test))
test_db = test_db.map(preprocess).batch(512)
# 我們來測試一下sample的形狀。
sample = next(iter(train_db))
print('sample:', sample[0].shape, sample[1].shape,
tf.reduce_min(sample[0]), tf.reduce_max(sample[0])) # 值範圍爲[0,1]
def main():
# 輸入:[b, 32, 32, 3]
model = resnet18()
model.build(input_shape=(None, 32, 32, 3))
model.summary()
optimizer = optimizers.Adam(lr=1e-3)
for epoch in range(500):
for step, (x, y) in enumerate(train_db):
with tf.GradientTape() as tape:
# [b, 32, 32, 3] => [b, 100]
logits = model(x)
# [b] => [b, 100]
y_onehot = tf.one_hot(y, depth=100)
# compute loss 結果維度[b]
loss = tf.losses.categorical_crossentropy(y_onehot, logits, from_logits=True)
loss = tf.reduce_mean(loss)
# 梯度求解
grads = tape.gradient(loss, model.trainable_variables)
# 梯度更新
optimizer.apply_gradients(zip(grads, model.trainable_variables))
if step % 50 == 0:
print(epoch, step, 'loss:', float(loss))
# 做測試
total_num = 0
total_correct = 0
for x, y in test_db:
logits = model(x)
# 預測可能性。
prob = tf.nn.softmax(logits, axis=1)
pred = tf.argmax(prob, axis=1) # 還記得嗎pred類型爲int64,需要轉換一下。
pred = tf.cast(pred, dtype=tf.int32)
# 拿到預測值pred和真實值比較。
correct = tf.cast(tf.equal(pred, y), dtype=tf.int32)
correct = tf.reduce_sum(correct)
total_num += x.shape[0]
total_correct += int(correct) # 轉換爲numpy數據
acc = total_correct / total_num
print(epoch, 'acc:', acc)
if __name__ == '__main__':
main()
- 運行結果顯示:
(50000, 32, 32, 3) (50000,) (10000, 32, 32, 3) (10000,)
sample: (512, 32, 32, 3) (512,) tf.Tensor(-0.5, shape=(), dtype=float32) tf.Tensor(0.5, shape=(), dtype=float32)
Model: "res_net"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
sequential (Sequential) multiple 2048
_________________________________________________________________
sequential_1 (Sequential) multiple 148736
_________________________________________________________________
sequential_2 (Sequential) multiple 526976
_________________________________________________________________
sequential_4 (Sequential) multiple 2102528
_________________________________________________________________
sequential_6 (Sequential) multiple 8399360
_________________________________________________________________
global_average_pooling2d (Gl multiple 0
_________________________________________________________________
dense (Dense) multiple 51300
=================================================================
Total params: 11,230,948
Trainable params: 11,223,140
Non-trainable params: 7,808
_________________________________________________________________
0 0 loss: 4.606736183166504
0 50 loss: 4.416693687438965
0 acc: 0.0604
1 0 loss: 4.007843971252441
1 50 loss: 3.776982307434082
1 acc: 0.1154
2 0 loss: 3.6374154090881348
2 50 loss: 3.4091014862060547
2 acc: 0.1768
3 0 loss: 3.335117816925049
3 50 loss: 3.0340826511383057
3 acc: 0.2271
4 0 loss: 3.0630342960357666
4 50 loss: 2.767192840576172
4 acc: 0.2788
5 0 loss: 2.801095485687256
5 50 loss: 2.5093324184417725
5 acc: 0.2863
6 0 loss: 2.652071237564087
6 50 loss: 2.3743672370910645
6 acc: 0.3103
7 0 loss: 2.3989481925964355
7 50 loss: 2.2451577186584473
7 acc: 0.317
8 0 loss: 2.3536462783813477
8 50 loss: 2.095005989074707
8 acc: 0.3192
9 0 loss: 2.145143985748291
9 50 loss: 1.9432967901229858
9 acc: 0.3216
10 0 loss: 2.055953025817871
10 50 loss: 1.8490103483200073
10 acc: 0.3235
11 0 loss: 1.845646858215332
11 50 loss: 1.5962769985198975
11 acc: 0.342
12 0 loss: 1.6497595310211182
12 50 loss: 1.5217297077178955
12 acc: 0.332
13 0 loss: 1.470338225364685
13 50 loss: 1.4912822246551514
13 acc: 0.3124
14 0 loss: 1.3743737936019897
14 50 loss: 1.2206969261169434
14 acc: 0.3074
15 0 loss: 1.3610031604766846
15 50 loss: 0.9420070052146912
15 acc: 0.3254
16 0 loss: 1.078605055809021
16 50 loss: 1.003871202468872
16 acc: 0.3174
17 0 loss: 1.0461890697479248
17 50 loss: 0.8586055040359497
17 acc: 0.3215
18 0 loss: 0.8623021841049194
18 50 loss: 0.6324957609176636
18 acc: 0.3169
19 0 loss: 0.9003666639328003
19 50 loss: 0.6545089483261108
19 acc: 0.3014
20 0 loss: 0.7230895757675171
20 50 loss: 0.41668233275413513
20 acc: 0.3162
21 0 loss: 0.4999226927757263
21 50 loss: 0.4038138687610626
21 acc: 0.3192
22 0 loss: 0.5035152435302734
22 50 loss: 0.36830756068229675
22 acc: 0.3115
23 0 loss: 0.5791099071502686
23 50 loss: 0.4304996728897095
23 acc: 0.3208
24 0 loss: 0.38201427459716797
24 50 loss: 0.23830433189868927
24 acc: 0.3356
25 0 loss: 0.21569305658340454
25 50 loss: 0.2295464128255844
25 acc: 0.3327
26 0 loss: 0.1231858879327774
26 50 loss: 0.20612354576587677
26 acc: 0.3323
27 0 loss: 0.1556326150894165
27 50 loss: 0.15461283922195435
27 acc: 0.3345
28 0 loss: 0.09280207753181458
28 50 loss: 0.05414274334907532
28 acc: 0.334
29 0 loss: 0.05890154093503952
29 50 loss: 0.08330313116312027
29 acc: 0.3374
30 0 loss: 0.06374034285545349
30 50 loss: 0.0645279586315155
30 acc: 0.3507
31 0 loss: 0.06771121919155121
31 50 loss: 0.03828241676092148
31 acc: 0.3435
32 0 loss: 0.05325049161911011
32 50 loss: 0.06898440420627594
32 acc: 0.3472
33 0 loss: 0.052143510431051254
33 50 loss: 0.07428835332393646
33 acc: 0.3515
34 0 loss: 0.05063686892390251
34 50 loss: 0.041026901453733444
34 acc: 0.3461
35 0 loss: 0.09660334885120392
35 50 loss: 0.10083606839179993
35 acc: 0.3467
36 0 loss: 0.0585043728351593
36 50 loss: 0.04725605621933937
36 acc: 0.3479
37 0 loss: 0.05428542569279671
37 50 loss: 0.0645551085472107
37 acc: 0.3429
38 0 loss: 0.04979332536458969
38 50 loss: 0.028766361996531487
38 acc: 0.3448
39 0 loss: 0.06059214845299721
39 50 loss: 0.03867074102163315
39 acc: 0.352
40 0 loss: 0.04751269519329071
40 50 loss: 0.05410218983888626
40 acc: 0.3406
41 0 loss: 0.07864020764827728
41 50 loss: 0.06852877885103226
41 acc: 0.3527
42 0 loss: 0.04342082887887955
42 50 loss: 0.0316157229244709
42 acc: 0.3542
43 0 loss: 0.08915773034095764
43 50 loss: 0.061082299798727036
43 acc: 0.3551
44 0 loss: 0.06201590225100517
44 50 loss: 0.07863974571228027
44 acc: 0.3527
45 0 loss: 0.06855347752571106
45 50 loss: 0.06905807554721832
45 acc: 0.3551
46 0 loss: 0.046435438096523285
46 50 loss: 0.06059195101261139
46 acc: 0.3474
47 0 loss: 0.03513294830918312
47 50 loss: 0.048817235976457596
47 acc: 0.3509
48 0 loss: 0.04353480041027069
48 50 loss: 0.03148560971021652
48 acc: 0.3473
49 0 loss: 0.05442756786942482
49 50 loss: 0.03871474415063858
49 acc: 0.3467
- 500個epoch測試結果如下:
注意: 對於ResNet-152的參數配置可以參考:ResNet-152