ResNet(V2)結構以及Tensorflow實現

Tensorflow實現ResNet_V2

引言:
ResNet是由Kaiming He等4名華人提出,通過使用Residual Unit成功訓練了152層的深度神經網絡,在ILSVRC 2015比賽中獲得冠軍,取得了3.57%的top-5錯誤率,同時參數卻比VGGNet少。之後很多方法都建立在ResNet的基礎上完成的,例如檢測,分割,識別等領域都紛紛使用ResNet。在ResNet推出不久,Google就借鑑了ResNet的精髓,提出了Inception V4和Inception-ResNet-V2,並通過融合這兩個模型,在ILSVRC數據集上取得了驚人的3.08%的錯誤率。所以可見ResNet確實很好用。


  • 相關閱讀:
    VGGNet以及Tensorflow實現
    AlexNet以及Tensorflow實現
    GoogleInceptionNet_V3以及Tensorflow實現

  • ResNet靈感來源:
    這來自於困擾深度學習領域的一個Degradation的問題,即在不斷加深神經網絡深度時,準確率會先上升然後達到飽和,再持續增加深度則會導致準確率下降。這並不是過擬合的問題,因爲不光在測試集上誤差增大,訓練集本身誤差也會增大。

  • ResNet結構:
    在這裏插入圖片描述
    其中,每一列對應的是不同深度的ResNet。此外,表格中每一個用中括號括起來的都是一個殘差學習模塊組,其內部的結構類似於下圖:

這種結構與HighWay NetWork非常類似。這種結構不僅能夠解決非常深的神經網絡結構出現梯度消失或者梯度爆炸的問題,而且也能在一定程度上保護信息的完整性(因爲傳統的卷積層或者全連接層在信息傳遞時,或多或少會存在信息丟失或者損耗等問題)

Note: 其實這裏有一個小tips,上面右邊那張圖有兩種不同的結構,但是可以發現,如果採用的是第一種結構,則參數數目大約爲(這裏先假設輸入輸出都爲256-d):

左圖:3x3x256x64 + 3x3x64x256 = 294912
右圖:1x1x256x64 + 3x3x64x64 + 1x1x64x256 = 69632
兩者相差將近5倍,顯然用右邊的結構能夠在比較深的網絡中仍然能保持參數的規模不至於過大
  • ResNet_V2的改進之處:
  1. 作者通過研究ResNet殘差學習單元的傳播公式,發現前饋和反饋信號可以直接傳輸,因此最後的非線性激活函數(如Relu)替換了爲了恆等映射(Identity Mapping)
    y=xy = x,從而使得訓練變得更加容易
  2. 在每一層中都使用BN,這樣處理之後,新的殘差單元將比以前更容易訓練且泛化性更強
  • 模塊化方法:
    本文將通過模塊化加註釋的方法,來實現ResNet_V2,這樣可以幫助初學者快速讀懂其結果以及實現方法
  1. 導入包並設計Block模塊組
  2. 定義部分方法
  3. 定義生成ResNet V2的主函數
  4. 定義不同深度的ResNet網絡結構
  5. 定義耗時(測試用)

導入包並設計Block模塊組:

import collections
import tensorflow as tf
from datetime import datetime
import math
import time
slim = tf.contrib.slim

'''
使用collectin.namedtuple設計ResNet基本Block模塊組
示例:
    MyTupleClass = collections.namedtuple('MyTupleClass',['name', 'age', 'job'])
    obj = MyTupleClass("Tomsom",12,'Cooker')
    print(obj.name)
    print(obj.age)
    print(obj.job)
執行結果:
    Tomsom
    12
    Cooker
'''
Block = collections.namedtuple('Block', ['scope', 'unit_fn', 'args'])

定義部分方法:

# 實行降採樣,如果factor不爲1就以factor爲步長降採樣
def subsample(inputs, factor, scope = None):
    if factor == 1:
        return inputs
    else:
        return slim.max_pool2d(inputs, [1, 1], stride = factor, scope = scope)

# 依據步長選擇卷積策略
def conv2d_same(inputs, num_outputs, kernel_size, stride, scope=None):
    if stride == 1:  # 如果stride == 1, 直接進行卷積
        return slim.conv2d(inputs, num_outputs, kernel_size, stride = 1,
                           padding = 'SAME', scope = scope)
    else:     # 如果stride != 1
        pad_total = kernel_size - 1
        pad_beg = pad_total // 2     # 上方、左方填充0的列數(行數)
        pad_end = pad_total - pad_beg    # 下方、右方填充0的列數(行數)
        inputs = tf.pad(inputs, [[0,0], [pad_beg, pad_end], [pad_beg, pad_end], [0,0]]) # 進行全0填充
        return slim.conv2d(inputs, num_outputs, kernel_size, stride = stride, padding = 'VALID', scope = scope)
        
# 下面等價於slim.add_arg_scope(stack_blocks_dense)
@slim.add_arg_scope
def stack_blocks_dense(net, blocks, outputs_collections = None):
    '''
    定義堆疊Blocks的函數
    net: 輸入
    blocks: 之前定義的Block的class的列表
    outputs_collections: 用來收集各個end_point的collections
    '''
    for block in blocks:
        with tf.variable_scope(block.scope, 'block', [net]) as sc:
            for i, unit in enumerate(block.args):
                with tf.variable_scope('unit_%d' % (i+1), values = [net]):
                    unit_depth, unit_depth_bottleneck, unit_stride = unit
                    # unit_fn: 殘差學習單元的生成函數,順序地創建並連接所有的殘差學習單元
                    net = block.unit_fn(net, depth = unit_depth, 
                                        depth_bottleneck = unit_depth_bottleneck,
                                        stride = unit_stride)
            # collect_named_outputs: 將輸出net添加到collection中
            net = slim.utils.collect_named_outputs(outputs_collections, sc.name, net)
    return net


# 創立ResNet通用的arg_scope
def resnet_arg_scope(is_training = True,
                     weight_decay = 0.0001,
                     batch_norm_decay = 0.997,
                     batch_norm_epsilon = 1e-5,
                     batch_norm_scale = True):
    '''
    weight_decay: 權重衰減速率, 即下面L2所佔比
    batch_norm_decay: BN衰減速率
    batch_norm_epsilon: BN的epsilon
    batch_norm_scale: BN的scale默認爲True, 即乘以公式中的gamma
    '''
    batch_norm_paras = {
        'is_training': is_training,
        'decay': batch_norm_decay,
        'epsilon': batch_norm_epsilon,
        'scale': batch_norm_scale,
        'updates_collections': tf.GraphKeys.UPDATE_OPS
    }
    # 設置slim.conv2d()中的參數默認值
    with slim.arg_scope([slim.conv2d],
                        weights_regularizer = slim.l2_regularizer(weight_decay),
                        weights_initializer = slim.variance_scaling_initializer(),
                        activation_fn = tf.nn.relu,
                        normalizer_fn = slim.batch_norm,
                        normalizer_params = batch_norm_paras):
        # 設置slim.batch_norm中的參數默認值,**batch_norm_para是解包的作用,把對應參數值分配給BN中的參數
        with slim.arg_scope([slim.batch_norm], **batch_norm_paras):
            # 設置最大池化的默認參數
            with slim.arg_scope([slim.max_pool2d], padding = 'SAME') as arg_sc:
                return arg_sc


# 定義核心的bottleneck殘差學習單元(是ResNet V2論文提到的Full Preactivation Residual Net的一個變種)
@slim.add_arg_scope
def bottleneck(inputs, depth, depth_bottleneck, stride,
               outputs_collections = None, scope = None):
    '''
    inputs: 輸入
    depth, depth_bottleneck, stride: Blocks類中的args
    outputs_collection: 收集end_points的collection
    scope: 當前unit的名稱
    '''
    with tf.variable_scope(scope, 'bottleneck_v2', [inputs]) as sc:
        # 獲取輸入的最後一個維度
        depth_in = slim.utils.last_dimension(inputs.get_shape(), min_rank = 4)
        # 對輸入進行預BN操作
        preact = slim.batch_norm(inputs, activation_fn = tf.nn.relu, scope = 'preact')

        if depth == depth_in: # 如果殘差單元的輸入通道數depth_in與輸出通道數depth相同
            shortcut = subsample(inputs, stride, 'shortcut')
        else:            # 如果輸入與輸出通道不一致,則用stride=1的卷積操作改變通道數
            shortcut = slim.conv2d(preact, depth, [1, 1], stride = stride,
                                   normalizer_fn = None, 
                                   activation_fn = None,
                                   scope = 'shortcut')
        # 輸出通道數爲depth_bottleneck的卷積, 卷積核1x1
        residual = slim.conv2d(preact, depth_bottleneck, [1, 1],
                               stride = 1, scope = 'conv1')
        # 3表示卷積核尺寸3x3(幫你看代碼抗"過擬合")
        residual = conv2d_same(residual, depth_bottleneck, 3,
                                    stride, scope = 'conv2')
        # 輸出通道數爲depth, 卷積核1x1
        residual = slim.conv2d(residual, depth, [1, 1], stride = 1,
                               normalizer_fn = None, activation_fn = None,
                               scope = 'conv3')
        # 實現Residual output的結果
        output = residual + shortcut
        # 將結果添加入collection,並返回output作爲函數結果
        return slim.utils.collect_named_outputs(outputs_collections, sc.name, output)

定義生成ResNet V2的主函數:

def resnet_v2(inputs, blocks, num_classes = None,
              global_pool = True, 
              include_root_block = True,
              reuse = None,
              scope = None):
    '''
    inputs: 輸入
    blocks: 定義好的Block類的列表
    num_classes: 最後輸出的類數
    global_pool: 標誌是否加上最後一層的全局平均池化
    include_root_block: 標誌是否加上ResNet網絡最前面通常使用的7x7卷積和最大池化
    reuse: 標誌是否重用
    scope: 整個網絡的名稱
    '''
    with tf.variable_scope(scope, 'resnet_v2', [inputs], reuse = reuse) as sc:
        end_points_collection =sc.original_name_scope + '_end_point'
        # 設置outputs_collections默認參數爲end_points_collection
        with slim.arg_scope([slim.conv2d, bottleneck, stack_blocks_dense],
                            outputs_collections = end_points_collection):
            net = inputs
            if include_root_block:
                # 設置slim.conv2d的默認參數
                with slim.arg_scope([slim.conv2d], activation_fn = None,
                                    normalizer_fn = None):
                    # 創建ResNet最前面64輸出通道步長爲2的7x7卷積
                    net = conv2d_same(net, 64, 7, stride = 2, scope = 'conv1')
                # 接步長爲2的3x3池化
                net = slim.max_pool2d(net, [3, 3], stride = 2, scope = 'pool1')   # 執行完後,圖片尺寸以縮小爲1/4
            # 用stack_blocks_dense把殘差學習模塊生成好
            net = stack_blocks_dense(net, blocks)
            net = slim.batch_norm(net, activation_fn = tf.nn.relu, scope = 'postnorm')
            if global_pool: # 如果要進行全局平均池化層
                net = tf.reduce_mean(net, [1, 2], name = 'pool5', keep_dims = True)
            if num_classes is not None:  # 用卷積操作替代全連接層(添加一個輸出通道爲num_classes的1x1卷積)
                net = slim.conv2d(net, num_classes, [1, 1], activation_fn = None, normalizer_fn = None, scope = 'logits')
            # 將collection轉爲dict
            end_points = slim.utils.convert_collection_to_dict(end_points_collection) 
            if num_classes is not None:
                end_points['predictions'] = slim.softmax(net, scope = 'prediction')
            return net, end_points

定義不同深度的ResNet網絡結構:

# 50層深度的ResNet網絡配置
def resnet_v2_50(inputs, num_classes = None,
                 global_pool = True,
                 reuse = None,
                 scope = 'resnet_v2_50'):
    '''
    以下面Block('block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)])爲例
    block1: 是這個Block的名稱
    bottleneck: 前面定義的殘差學習單元(有三層)
    [(256, 64, 1)] * 2 + [(256, 64, 2)]: 是一個列表,其中每個元素都對應一個bottleneck
        殘差學習單元,前面兩個元素都是(256, 64, 1),最後一個是(256, 64, 2)。每個元素都
        時一個3元組,即(depth, depth_bottleneck, stride),代表構建的bottleneck殘差學
        習單元中,第三層的輸出通道爲256(depth),前兩層的輸出通道數爲64(depth_bottleneck)
        且中間那層的步長stride爲1(stride)
    '''
    blocks = [
        Block('block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
        Block('block2', bottleneck, [(512, 128, 1)] * 3 + [(512, 128, 2)]),
        Block('block3', bottleneck, [(1024, 256, 1)] * 5 + [(1024, 256, 2)]),
        Block('block4', bottleneck, [(2048, 512, 1)] * 3)]
    return resnet_v2(inputs, blocks, num_classes, global_pool,
                     include_root_block = True, reuse = reuse,
                     scope = scope)

# 101層深度的ResNet網絡配置
def resnet_v2_101(inputs, num_classes = None,
                  global_pool = True,
                  reuse = None,
                  scope = 'resnet_v2_101'):
    blocks = [
        Block('block1', bottleneck, [(256, 64, 1)] * 2+ [(256, 64, 2)]),
        Block('block2', bottleneck, [(512, 128, 1)] * 3 + [(512, 128, 2)]),
        Block('block3', bottleneck, [(1024, 256, 1)] * 22 + [(1024, 256, 2)]),
        Block('block4', bottleneck, [(2048, 512, 1)] * 3)]
    return resnet_v2(inputs, blocks, num_classes, global_pool,
                     include_root_block = True, reuse = reuse,
                     scope = scope)

# 152層深度的ResNet網絡配置
def resnet_v2_152(inputs, num_classes = None,
                  global_pool = True,
                  reuse = None,
                  scope = 'resnet_v2_152'):
    blocks = [
        Block('block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
        Block('block2', bottleneck, [(512, 128, 1)] * 7 + [(512, 128, 2)]),
        Block('block3', bottleneck, [(1024, 256, 1)] * 35 + [(1024, 256, 2)]),
        Block('block4', bottleneck, [(2048, 512, 1)] * 3)]
    return resnet_v2(inputs, blocks, num_classes, global_pool,
                     include_root_block = True, reuse = reuse,
                     scope = scope)

# 200層深度的ResNet網絡配置
def resnet_v2_200(inputs, num_classes = None,
                  global_pool = True,
                  reuse = None,
                  scope = 'resnet_v2_200'):
    blocks = [
        Block('block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]),
        Block('block2', bottleneck, [(512, 128, 1)] * 23 + [(512, 128, 2)]),
        Block('block3', bottleneck, [(1024, 256, 1)] * 35 + [(1024, 256, 2)]),
        Block('block4', bottleneck, [(2048, 512, 1)] * 3)]
    return resnet_v2(inputs, blocks, num_classes, global_pool,
                     include_root_block = True, reuse = reuse,
                     scope = scope)

定義耗時:

def time_tensorflow_run(session, target, info_string):
    num_steps_burn_in = 10  # 打印閾值
    total_duration = 0.0    # 每一輪所需要的迭代時間
    total_duration_aquared = 0.0  # 每一輪所需要的迭代時間的平方
    for i in range(num_batches + num_steps_burn_in):
        start_time = time.time()
        _ = session.run(target)
        duration = time.time() - start_time    # 計算耗時
        if i >= num_steps_burn_in:
            if not i % 10:
                print("%s : step %d, duration = %.3f" % (datetime.now(), i - num_steps_burn_in, duration))
            total_duration += duration
            total_duration_aquared += duration * duration
    mn = total_duration / num_batches   # 計算均值
    vr = total_duration_aquared / num_batches - mn * mn  # 計算方差
    sd = math.sqrt(vr) # 計算標準差
    print("%s : %s across %d steps, %.3f +/- %.3f sec/batch" % (datetime.now(), info_string, num_batches, mn, sd))

測試:
注:CNN的訓練都是比較耗時的,所以這裏也就是測試一下幾張隨機生成圖的前向和反向傳播過程

batch_size = 32
height, width = 224, 224
inputs = tf.random_uniform((batch_size, height, width, 3))
with slim.arg_scope(resnet_arg_scope(is_training = False)):
    net, end_points = resnet_v2_152(inputs, 1000)

with tf.Session() as sess:
    init = tf.global_variables_initializer()
    sess.run(init)
    num_batches = 100
    time_tensorflow_run(sess, net, "Forward")

運行效果:

WARNING:tensorflow:From <ipython-input-6-adddb457e8ef>:34: calling reduce_mean (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version.
Instructions for updating:
keep_dims is deprecated, use keepdims instead
2019-01-26 08:10:51.879413 : step 0, duration = 0.486
2019-01-26 08:10:56.748640 : step 10, duration = 0.487
2019-01-26 08:11:01.628659 : step 20, duration = 0.489
2019-01-26 08:11:06.511324 : step 30, duration = 0.489
2019-01-26 08:11:11.410210 : step 40, duration = 0.490
2019-01-26 08:11:16.311633 : step 50, duration = 0.491
2019-01-26 08:11:21.219118 : step 60, duration = 0.493
2019-01-26 08:11:26.133231 : step 70, duration = 0.492
2019-01-26 08:11:31.054586 : step 80, duration = 0.493
2019-01-26 08:11:35.984226 : step 90, duration = 0.494
2019-01-26 08:11:40.435636 : Forward across 100 steps, 0.490 +/- 0.002 sec/batch

[1] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.
[2] He K, Zhang X, Ren S, et al. Identity mappings in deep residual networks[C]//European conference on computer vision. Springer, Cham, 2016: 630-645.
[3] Tensorflo實戰.黃文堅,唐源

如果覺得我有地方講的不好的或者有錯誤的歡迎給我留言,謝謝大家閱讀(點個讚我可是會很開心的哦)~

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章