Tensorflow實現ResNet_V2
引言:
ResNet是由Kaiming He等4名華人提出,通過使用Residual Unit成功訓練了152層的深度神經網絡,在ILSVRC 2015比賽中獲得冠軍,取得了3.57%的top-5錯誤率,同時參數卻比VGGNet少。之後很多方法都建立在ResNet的基礎上完成的,例如檢測,分割,識別等領域都紛紛使用ResNet。在ResNet推出不久,Google就借鑑了ResNet的精髓,提出了Inception V4和Inception-ResNet-V2,並通過融合這兩個模型,在ILSVRC數據集上取得了驚人的3.08%的錯誤率。所以可見ResNet確實很好用。
-
相關閱讀:
VGGNet以及Tensorflow實現
AlexNet以及Tensorflow實現
GoogleInceptionNet_V3以及Tensorflow實現 -
ResNet靈感來源:
這來自於困擾深度學習領域的一個Degradation的問題,即在不斷加深神經網絡深度時,準確率會先上升然後達到飽和,再持續增加深度則會導致準確率下降。這並不是過擬合的問題,因爲不光在測試集上誤差增大,訓練集本身誤差也會增大。 -
ResNet結構:
其中,每一列對應的是不同深度的ResNet。此外,表格中每一個用中括號括起來的都是一個殘差學習模塊組,其內部的結構類似於下圖:
Note: 其實這裏有一個小tips,上面右邊那張圖有兩種不同的結構,但是可以發現,如果採用的是第一種結構,則參數數目大約爲(這裏先假設輸入輸出都爲256-d):
- ResNet_V2的改進之處:
- 作者通過研究ResNet殘差學習單元的傳播公式,發現前饋和反饋信號可以直接傳輸,因此最後的非線性激活函數(如Relu)替換了爲了恆等映射(Identity Mapping)
,從而使得訓練變得更加容易 - 在每一層中都使用BN,這樣處理之後,新的殘差單元將比以前更容易訓練且泛化性更強
- 模塊化方法:
本文將通過模塊化加註釋的方法,來實現ResNet_V2,這樣可以幫助初學者快速讀懂其結果以及實現方法
- 導入包並設計Block模塊組
- 定義部分方法
- 定義生成ResNet V2的主函數
- 定義不同深度的ResNet網絡結構
- 定義耗時(測試用)
導入包並設計Block模塊組:
import collections
import tensorflow as tf
from datetime import datetime
import math
import time
slim = tf.contrib.slim
'''
使用collectin.namedtuple設計ResNet基本Block模塊組
示例:
MyTupleClass = collections.namedtuple('MyTupleClass',['name', 'age', 'job'])
obj = MyTupleClass("Tomsom",12,'Cooker')
print(obj.name)
print(obj.age)
print(obj.job)
執行結果:
Tomsom
12
Cooker
'''
Block = collections.namedtuple('Block', ['scope', 'unit_fn', 'args'])
定義部分方法:
# 實行降採樣,如果factor不爲1就以factor爲步長降採樣
def subsample(inputs, factor, scope = None):
if factor == 1:
return inputs
else:
return slim.max_pool2d(inputs, [1, 1], stride = factor, scope = scope)
# 依據步長選擇卷積策略
def conv2d_same(inputs, num_outputs, kernel_size, stride, scope=None):
if stride == 1: # 如果stride == 1, 直接進行卷積
return slim.conv2d(inputs, num_outputs, kernel_size, stride = 1,
padding = 'SAME', scope = scope)
else: # 如果stride != 1
pad_total = kernel_size - 1
pad_beg = pad_total // 2 # 上方、左方填充0的列數(行數)
pad_end = pad_total - pad_beg # 下方、右方填充0的列數(行數)
inputs = tf.pad(inputs, [[0,0], [pad_beg, pad_end], [pad_beg, pad_end], [0,0]]) # 進行全0填充
return slim.conv2d(inputs, num_outputs, kernel_size, stride = stride, padding = 'VALID', scope = scope)
# 下面等價於slim.add_arg_scope(stack_blocks_dense)
@slim.add_arg_scope
def stack_blocks_dense(net, blocks, outputs_collections = None):
'''
定義堆疊Blocks的函數
net: 輸入
blocks: 之前定義的Block的class的列表
outputs_collections: 用來收集各個end_point的collections
'''
for block in blocks:
with tf.variable_scope(block.scope, 'block', [net]) as sc:
for i, unit in enumerate(block.args):
with tf.variable_scope('unit_%d' % (i+1), values = [net]):
unit_depth, unit_depth_bottleneck, unit_stride = unit
# unit_fn: 殘差學習單元的生成函數,順序地創建並連接所有的殘差學習單元
net = block.unit_fn(net, depth = unit_depth,
depth_bottleneck = unit_depth_bottleneck,
stride = unit_stride)
# collect_named_outputs: 將輸出net添加到collection中
net = slim.utils.collect_named_outputs(outputs_collections, sc.name, net)
return net
# 創立ResNet通用的arg_scope
def resnet_arg_scope(is_training = True,
weight_decay = 0.0001,
batch_norm_decay = 0.997,
batch_norm_epsilon = 1e-5,
batch_norm_scale = True):
'''
weight_decay: 權重衰減速率, 即下面L2所佔比
batch_norm_decay: BN衰減速率
batch_norm_epsilon: BN的epsilon
batch_norm_scale: BN的scale默認爲True, 即乘以公式中的gamma
'''
batch_norm_paras = {
'is_training': is_training,
'decay': batch_norm_decay,
'epsilon': batch_norm_epsilon,
'scale': batch_norm_scale,
'updates_collections': tf.GraphKeys.UPDATE_OPS
}
# 設置slim.conv2d()中的參數默認值
with slim.arg_scope([slim.conv2d],
weights_regularizer = slim.l2_regularizer(weight_decay),
weights_initializer = slim.variance_scaling_initializer(),
activation_fn = tf.nn.relu,
normalizer_fn = slim.batch_norm,
normalizer_params = batch_norm_paras):
# 設置slim.batch_norm中的參數默認值,**batch_norm_para是解包的作用,把對應參數值分配給BN中的參數
with slim.arg_scope([slim.batch_norm], **batch_norm_paras):
# 設置最大池化的默認參數
with slim.arg_scope([slim.max_pool2d], padding = 'SAME') as arg_sc:
return arg_sc
# 定義核心的bottleneck殘差學習單元(是ResNet V2論文提到的Full Preactivation Residual Net的一個變種)
@slim.add_arg_scope
def bottleneck(inputs, depth, depth_bottleneck, stride,
outputs_collections = None, scope = None):
'''
inputs: 輸入
depth, depth_bottleneck, stride: Blocks類中的args
outputs_collection: 收集end_points的collection
scope: 當前unit的名稱
'''
with tf.variable_scope(scope, 'bottleneck_v2', [inputs]) as sc:
# 獲取輸入的最後一個維度
depth_in = slim.utils.last_dimension(inputs.get_shape(), min_rank = 4)
# 對輸入進行預BN操作
preact = slim.batch_norm(inputs, activation_fn = tf.nn.relu, scope = 'preact')
if depth == depth_in: # 如果殘差單元的輸入通道數depth_in與輸出通道數depth相同
shortcut = subsample(inputs, stride, 'shortcut')
else: # 如果輸入與輸出通道不一致,則用stride=1的卷積操作改變通道數
shortcut = slim.conv2d(preact, depth, [1, 1], stride = stride,
normalizer_fn = None,
activation_fn = None,
scope = 'shortcut')
# 輸出通道數爲depth_bottleneck的卷積, 卷積核1x1
residual = slim.conv2d(preact, depth_bottleneck, [1, 1],
stride = 1, scope = 'conv1')
# 3表示卷積核尺寸3x3(幫你看代碼抗"過擬合")
residual = conv2d_same(residual, depth_bottleneck, 3,
stride, scope = 'conv2')
# 輸出通道數爲depth, 卷積核1x1
residual = slim.conv2d(residual, depth, [1, 1], stride = 1,
normalizer_fn = None, activation_fn = None,
scope = 'conv3')
# 實現Residual output的結果
output = residual + shortcut
# 將結果添加入collection,並返回output作爲函數結果
return slim.utils.collect_named_outputs(outputs_collections, sc.name, output)
定義生成ResNet V2的主函數:
def resnet_v2(inputs, blocks, num_classes = None,
global_pool = True,
include_root_block = True,
reuse = None,
scope = None):
'''
inputs: 輸入
blocks: 定義好的Block類的列表
num_classes: 最後輸出的類數
global_pool: 標誌是否加上最後一層的全局平均池化
include_root_block: 標誌是否加上ResNet網絡最前面通常使用的7x7卷積和最大池化
reuse: 標誌是否重用
scope: 整個網絡的名稱
'''
with tf.variable_scope(scope, 'resnet_v2', [inputs], reuse = reuse) as sc:
end_points_collection =sc.original_name_scope + '_end_point'
# 設置outputs_collections默認參數爲end_points_collection
with slim.arg_scope([slim.conv2d, bottleneck, stack_blocks_dense],
outputs_collections = end_points_collection):
net = inputs
if include_root_block:
# 設置slim.conv2d的默認參數
with slim.arg_scope([slim.conv2d], activation_fn = None,
normalizer_fn = None):
# 創建ResNet最前面64輸出通道步長爲2的7x7卷積
net = conv2d_same(net, 64, 7, stride = 2, scope = 'conv1')
# 接步長爲2的3x3池化
net = slim.max_pool2d(net, [3, 3], stride = 2, scope = 'pool1') # 執行完後,圖片尺寸以縮小爲1/4
# 用stack_blocks_dense把殘差學習模塊生成好
net = stack_blocks_dense(net, blocks)
net = slim.batch_norm(net, activation_fn = tf.nn.relu, scope = 'postnorm')
if global_pool: # 如果要進行全局平均池化層
net = tf.reduce_mean(net, [1, 2], name = 'pool5', keep_dims = True)
if num_classes is not None: # 用卷積操作替代全連接層(添加一個輸出通道爲num_classes的1x1卷積)
net = slim.conv2d(net, num_classes, [1, 1], activation_fn = None, normalizer_fn = None, scope = 'logits')
# 將collection轉爲dict
end_points = slim.utils.convert_collection_to_dict(end_points_collection)
if num_classes is not None:
end_points['predictions'] = slim.softmax(net, scope = 'prediction')
return net, end_points
定義不同深度的ResNet網絡結構:
# 50層深度的ResNet網絡配置
def resnet_v2_50(inputs, num_classes = None,
global_pool = True,
reuse = None,
scope = 'resnet_v2_50'):
'''
以下面Block('block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)])爲例
block1: 是這個Block的名稱
bottleneck: 前面定義的殘差學習單元(有三層)
[(256, 64, 1)] * 2 + [(256, 64, 2)]: 是一個列表,其中每個元素都對應一個bottleneck
殘差學習單元,前面兩個元素都是(256, 64, 1),最後一個是(256, 64, 2)。每個元素都
時一個3元組,即(depth, depth_bottleneck, stride),代表構建的bottleneck殘差學
習單元中,第三層的輸出通道爲256(depth),前兩層的輸出通道數爲64(depth_bottleneck)
且中間那層的步長stride爲1(stride)
'''
blocks = [
Block('block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
Block('block2', bottleneck, [(512, 128, 1)] * 3 + [(512, 128, 2)]),
Block('block3', bottleneck, [(1024, 256, 1)] * 5 + [(1024, 256, 2)]),
Block('block4', bottleneck, [(2048, 512, 1)] * 3)]
return resnet_v2(inputs, blocks, num_classes, global_pool,
include_root_block = True, reuse = reuse,
scope = scope)
# 101層深度的ResNet網絡配置
def resnet_v2_101(inputs, num_classes = None,
global_pool = True,
reuse = None,
scope = 'resnet_v2_101'):
blocks = [
Block('block1', bottleneck, [(256, 64, 1)] * 2+ [(256, 64, 2)]),
Block('block2', bottleneck, [(512, 128, 1)] * 3 + [(512, 128, 2)]),
Block('block3', bottleneck, [(1024, 256, 1)] * 22 + [(1024, 256, 2)]),
Block('block4', bottleneck, [(2048, 512, 1)] * 3)]
return resnet_v2(inputs, blocks, num_classes, global_pool,
include_root_block = True, reuse = reuse,
scope = scope)
# 152層深度的ResNet網絡配置
def resnet_v2_152(inputs, num_classes = None,
global_pool = True,
reuse = None,
scope = 'resnet_v2_152'):
blocks = [
Block('block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
Block('block2', bottleneck, [(512, 128, 1)] * 7 + [(512, 128, 2)]),
Block('block3', bottleneck, [(1024, 256, 1)] * 35 + [(1024, 256, 2)]),
Block('block4', bottleneck, [(2048, 512, 1)] * 3)]
return resnet_v2(inputs, blocks, num_classes, global_pool,
include_root_block = True, reuse = reuse,
scope = scope)
# 200層深度的ResNet網絡配置
def resnet_v2_200(inputs, num_classes = None,
global_pool = True,
reuse = None,
scope = 'resnet_v2_200'):
blocks = [
Block('block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
Block('block2', bottleneck, [(512, 128, 1)] * 23 + [(512, 128, 2)]),
Block('block3', bottleneck, [(1024, 256, 1)] * 35 + [(1024, 256, 2)]),
Block('block4', bottleneck, [(2048, 512, 1)] * 3)]
return resnet_v2(inputs, blocks, num_classes, global_pool,
include_root_block = True, reuse = reuse,
scope = scope)
定義耗時:
def time_tensorflow_run(session, target, info_string):
num_steps_burn_in = 10 # 打印閾值
total_duration = 0.0 # 每一輪所需要的迭代時間
total_duration_aquared = 0.0 # 每一輪所需要的迭代時間的平方
for i in range(num_batches + num_steps_burn_in):
start_time = time.time()
_ = session.run(target)
duration = time.time() - start_time # 計算耗時
if i >= num_steps_burn_in:
if not i % 10:
print("%s : step %d, duration = %.3f" % (datetime.now(), i - num_steps_burn_in, duration))
total_duration += duration
total_duration_aquared += duration * duration
mn = total_duration / num_batches # 計算均值
vr = total_duration_aquared / num_batches - mn * mn # 計算方差
sd = math.sqrt(vr) # 計算標準差
print("%s : %s across %d steps, %.3f +/- %.3f sec/batch" % (datetime.now(), info_string, num_batches, mn, sd))
測試:
注:CNN的訓練都是比較耗時的,所以這裏也就是測試一下幾張隨機生成圖的前向和反向傳播過程
batch_size = 32
height, width = 224, 224
inputs = tf.random_uniform((batch_size, height, width, 3))
with slim.arg_scope(resnet_arg_scope(is_training = False)):
net, end_points = resnet_v2_152(inputs, 1000)
with tf.Session() as sess:
init = tf.global_variables_initializer()
sess.run(init)
num_batches = 100
time_tensorflow_run(sess, net, "Forward")
運行效果:
WARNING:tensorflow:From <ipython-input-6-adddb457e8ef>:34: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
2019-01-26 08:10:51.879413 : step 0, duration = 0.486
2019-01-26 08:10:56.748640 : step 10, duration = 0.487
2019-01-26 08:11:01.628659 : step 20, duration = 0.489
2019-01-26 08:11:06.511324 : step 30, duration = 0.489
2019-01-26 08:11:11.410210 : step 40, duration = 0.490
2019-01-26 08:11:16.311633 : step 50, duration = 0.491
2019-01-26 08:11:21.219118 : step 60, duration = 0.493
2019-01-26 08:11:26.133231 : step 70, duration = 0.492
2019-01-26 08:11:31.054586 : step 80, duration = 0.493
2019-01-26 08:11:35.984226 : step 90, duration = 0.494
2019-01-26 08:11:40.435636 : Forward across 100 steps, 0.490 +/- 0.002 sec/batch
[1] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.
[2] He K, Zhang X, Ren S, et al. Identity mappings in deep residual networks[C]//European conference on computer vision. Springer, Cham, 2016: 630-645.
[3] Tensorflo實戰.黃文堅,唐源
如果覺得我有地方講的不好的或者有錯誤的歡迎給我留言,謝謝大家閱讀(點個讚我可是會很開心的哦)~