1、Batch Normalization概念

Batch Normalization：批標準化

批：一批數據，通常爲mini-batch
標準化： 0均值，1方差

優點：

可以用更大學習率，加速模型收斂；
可以不用精心設計權值初始化；
可以不用dropout或較小的dropout；
可以不用L2或者較小的weight decay；
可以不用LRN（local response normalization局部響應值的標準化）

上面僞代碼中最後一部分是affine transfrom，也就是scale and shift，公式中的gamma和beta是可學習參數，可以根據loss反向傳播更新參數。

爲什麼在進行normalize更新之後要加一個affine transform呢？這一步可以增強模型的容納能力，使模型更靈活，選擇性更多，可以讓模型判斷是否需要對模型進行變換。

這個方法是在論文《Batch Normalization:Accelerating Deep Network Training by Reducing Internal Covariate Shift》提出的，主要是爲了解決ICS問題（Internal Covariate Shift數據尺度變化）。

2、Pytorch的Batch Normalization 1d/2d/3d實現

Pytorch中nn.Batchnorm1d、nn.Batchnorm2d、nn.Batchnorm3d都繼承於基類_Batchnorm；

2.1 _BatchNorm

_BatchNorm的主要參數：

num_features：一個樣本特徵數量（最重要）；
eps：分母修正項，避免分母爲零；
momentum：指數加權平均估計當前mean/var；
affine：布爾變量，是否需要affine transform；
track_running_stats：訓練狀態還是測試狀態；如果是訓練狀態，mean/var需要不斷計算更新；如果在測試狀態，mean/var是固定的；

def __init__(self,num_features,eps=1e-5,momentum=0.1,affine=True,track_running_stats=True)

2.2 nn.BatchNorm1d/nn.BatchNorm2d/nn.NatchNorm3d

nn.BatchNorm1d/nn.BatchNorm2d/nn.NatchNorm3d的主要屬性：

running_mean：均值；
running_var：方差；
weight:affine transform中的gamma；
bias：affine transform中的beta；

BN的公式： $\widehat{x}_{i} \leftarrow \frac{x_{i}-\mu_{\mathcal{B}}}{\sqrt{\sigma_{\mathcal{B}}^{2}+\epsilon}}$ $y_{i} \leftarrow \gamma \widehat{x}_{i}+\beta \equiv \mathrm{B} \mathrm{N}_{\gamma, \beta}\left(x_{i}\right)$

BN中的均值和方差在訓練的時候採用指數加權平均進行計算，在測試時使用當前統計值： $running_mean = (1-momentum) * pre_{\_}running_{\_}mean + momentum * mean_{\_}t$ $running_{\_}=(1-momentum)*pre_{\_}running_{\_}var+momentum*var_{\_}t$

2.3 nn.BatchNorm1d/nn.BatchNorm2d/nn.NatchNorm3d對數據的要求

nn.BatchNorm1d input = Batch_size * 特徵數 * 1d特徵維度
nn.BatchNorm2d input = Batch_size * 特徵數 * 2d特徵維度
nn.BatchNorm3d input = Batch_size * 特徵數 * 3d特徵維度

2.3.1 nn.BatchNorm1d

在全連接層使用的就是nn.BatchNorm1d，全連接層中的每一個神經元就是一個特徵，假設一個網絡層有五個特徵，也就是一個網絡層有五個神經元，如下圖中的每一列是一個數據，每個數據有5個特徵作爲網絡層的輸入，每一個特徵的維度是紅色圓圈圈出的部分，維度爲1，這樣就構成了一個樣本的一個特徵。

每次訓練數據組成一個batch，假設一個batch有三個樣本，這樣的三個樣本組成的batch就構成了nn.BatchNorm1d的輸入數據形式，輸入數據的形式爲[3,5,1]，有時候1可以忽略，因此可以表示爲[3,5]；

我們知道，nn.BatchNorm1d有四個參數需要計算，這四個參數需要在特徵維度上進行計算，如上圖，現在有三個樣本，每個樣本有五個特徵，需要在三個樣本的同樣位置的特徵上求取均值、方差、gamma和beta，在每一個特徵維度上都有對應的均值、方差、gamma和beta。

下面通過代碼學習nn.BatchNorm1d：

    batch_size = 3  # batch_size
    num_features = 5  # 每一個數據的特徵個數
    momentum = 0.3

    features_shape = (1)  # 特徵維度爲1

    feature_map = torch.ones(features_shape)  # [1]                                                        # 1D
    feature_maps = torch.stack([feature_map*(i+1) for i in range(num_features)], dim=0)  # [1,2,3,4,5]     # 2D
    feature_maps_bs = torch.stack([feature_maps for i in range(batch_size)], dim=0)  # [[][][]]            # 3D
    print("input data:\n{} shape is {}".format(feature_maps_bs, feature_maps_bs.shape))

    bn = nn.BatchNorm1d(num_features=num_features, momentum=momentum)

    running_mean, running_var = 0, 1

    for i in range(2):
        outputs = bn(feature_maps_bs)

        print("\niteration:{}, running mean: {} ".format(i, bn.running_mean))
        print("iteration:{}, running var:{} ".format(i, bn.running_var))

        mean_t, var_t = 2, 0

        running_mean = (1 - momentum) * running_mean + momentum * mean_t
        running_var = (1 - momentum) * running_var + momentum * var_t

        print("iteration:{}, 第二個特徵的running mean: {} ".format(i, running_mean))
        print("iteration:{}, 第二個特徵的running var:{}".format(i, running_var))

通過運行代碼，得到輸出爲：

iteration:0, running mean: tensor([0.3000, 0.6000, 0.9000, 1.2000, 1.5000]) 
iteration:0, running var:tensor([0.7000, 0.7000, 0.7000, 0.7000, 0.7000]) 
iteration:0, 第二個特徵的running mean: 0.6 
iteration:0, 第二個特徵的running var:0.7

iteration:1, running mean: tensor([0.5100, 1.0200, 1.5300, 2.0400, 2.5500]) 
iteration:1, running var:tensor([0.4900, 0.4900, 0.4900, 0.4900, 0.4900]) 
iteration:1, 第二個特徵的running mean: 1.02 
iteration:1, 第二個特徵的running var:0.48999999999999994

2.3.2 nn.BatchNorm2d

nn.BatchNorm2d和nn.BatchNorm1d輸入數據的主要不同在於特徵維度上，卷積神經網絡輸出的一個特徵圖就是二維的形式。

如下圖，假設一個特徵圖的維度爲22，一個層有三個卷積核，會輸出三個通道的22的特徵圖，一個特徵圖在BN中理解爲一個特徵，BN會在一個特徵上求取均值、方差、gamma和beta。因此在nn.BatchNorm2d中輸入數據的形式爲[3,3,2,2]。

下面通過代碼研究nn.BatchNorm2d的具體使用：

    batch_size = 3
    num_features = 6
    momentum = 0.3
    
    features_shape = (2, 2)

    feature_map = torch.ones(features_shape)  # 2d                                                     # 2D
    feature_maps = torch.stack([feature_map*(i+1) for i in range(num_features)], dim=0)  # 3d         # 3D
    feature_maps_bs = torch.stack([feature_maps for i in range(batch_size)], dim=0)  # 4d             # 4D

    print("input data:\n{} shape is {}".format(feature_maps_bs, feature_maps_bs.shape))

    bn = nn.BatchNorm2d(num_features=num_features, momentum=momentum)

    running_mean, running_var = 0, 1

    for i in range(2):
        outputs = bn(feature_maps_bs)

        print("\niter:{}, running_mean.shape: {}".format(i, bn.running_mean.shape))
        print("iter:{}, running_var.shape: {}".format(i, bn.running_var.shape))

        print("iter:{}, weight.shape: {}".format(i, bn.weight.shape))
        print("iter:{}, bias.shape: {}".format(i, bn.bias.shape))

q代碼對應輸出爲：

iter:0, running_mean.shape: torch.Size([6])
iter:0, running_var.shape: torch.Size([6])
iter:0, weight.shape: torch.Size([6])
iter:0, bias.shape: torch.Size([6])

iter:1, running_mean.shape: torch.Size([6])
iter:1, running_var.shape: torch.Size([6])
iter:1, weight.shape: torch.Size([6])
iter:1, bias.shape: torch.Size([6])

2.3.3 nn.BatchNorm3d

下圖所示爲nn.BatchNorm3d的輸入數據形式，一個數據的一個特徵是3維的，其形式爲[2,2,3]，一個數據有3個特徵，一共有3個樣本，所以nn.BatchNorm3d的輸入數據形式爲[3,3,2,2,3]。

nn.BatchNorm3d的代碼如下：

    batch_size = 3
    num_features = 4
    momentum = 0.3

    features_shape = (2, 2, 3)

    feature = torch.ones(features_shape)                                                # 3D
    feature_map = torch.stack([feature * (i + 1) for i in range(num_features)], dim=0)  # 4D
    feature_maps = torch.stack([feature_map for i in range(batch_size)], dim=0)         # 5D

    print("input data:\n{} shape is {}".format(feature_maps, feature_maps.shape))

    bn = nn.BatchNorm3d(num_features=num_features, momentum=momentum)

    running_mean, running_var = 0, 1

    for i in range(2):
        outputs = bn(feature_maps)

        print("\niter:{}, running_mean.shape: {}".format(i, bn.running_mean.shape))
        print("iter:{}, running_var.shape: {}".format(i, bn.running_var.shape))

        print("iter:{}, weight.shape: {}".format(i, bn.weight.shape))
        print("iter:{}, bias.shape: {}".format(i, bn.bias.shape))

pytorch —— Batch Normalization

1、Batch Normalization概念

2、Pytorch的Batch Normalization 1d/2d/3d實現

2.1 _BatchNorm

2.2 nn.BatchNorm1d/nn.BatchNorm2d/nn.NatchNorm3d

2.3 nn.BatchNorm1d/nn.BatchNorm2d/nn.NatchNorm3d對數據的要求

2.3.1 nn.BatchNorm1d

2.3.2 nn.BatchNorm2d

2.3.3 nn.BatchNorm3d

leetcode —— 959. 由斜槓劃分區域

Python詞彙比較運算符

Python —— any()函數和all()函數

Pytorch —— 模型保存與加載

leetcode —— 40. 組合總和 II

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結