Resnet原理&源碼簡單分析

Resnet原理&源碼簡單分析


嗨,小夥伴們,今天讓我們來了解一下Resnet的原理以及Resnet18網絡在Pytorch的實現。

原理

Resnet想必大家都很熟悉了,它的中文名爲殘差網絡,是由何愷明大佬提出的一種網絡結構。

在論文的開篇,提出了一個問題,神經網絡越深,性能就越好嗎?答案是否定的,如越深的神經網絡可能造成著名的梯度消失、爆炸問題,但這個問題已經通過Batch normalization解決。

但更深的網絡呢?

會出現網絡層數增加但loss不降反升的情況(如下圖),而且這種下降並不是通過過擬合引起的。
在這裏插入圖片描述
爲了能夠實現更深的網絡同時不出現退化的現象,論文提出了一個假設:當把淺層網絡特徵傳到深層網絡時,深層網絡的效果一定會比淺層網絡好(至少不會差),只要保證輸出的特徵參數一致,那麼就可以使用Identity mapping(恆等映射),來傳遞特徵。

那麼,恆等映射是一個什麼樣的東西呢?如下圖,非常簡單,其實就是把淺層網絡的輸出X加到兩層或三層卷積層後(較深層)的輸出,在這個過程中,兩層神經網絡的輸出參數個數是一樣的,這也保證了不同深度的網絡參數可以進行直接相加。
在這裏插入圖片描述
我們把上面的Block,稱之爲A Building Block,在論文中,提出了resnet18,resnet34,resnet50,resnet101,resnet152,resnet後面的尾數分別代表了網絡的層數。在18、34的網絡裏,使用到的A Building Block是上圖左邊的形式。而在50、101、152網絡裏,使用的是右邊的形式。

提出的五個網絡結構如下所示:
在這裏插入圖片描述
這個圖是怎麼看的呢?我們以18層網絡Resnet18來看。
在這裏插入圖片描述
Resnet18要經過一個卷積層、Pooling層,然後是四個“小方塊”,一個方塊由兩個A Build Block組成,一個A Build Block又由兩個卷積層組成,四個“小方塊”即16層,最後是average pool、全連接層。由於Pooling層不需要參數學習,故去除Pooling層,整個resnet18網絡由18層組成。

源碼

爲了更好地幫助小夥伴理解,我把resnet在pytorch的源碼縮減了,目前這個代碼只實現了resnet18和resnet34。
如有需要可自行查看pytorch的resnet源碼
一開始圖像的輸入是224*224,經過第一次卷積之後,由於stride爲2,output輸出變爲112x112,然後經過一個max pool,size又縮小一半變爲56x56。這兩個都是正常的CNN操作。
在這裏插入圖片描述
不難在__init__定義以下組件,

self.conv1 = nn.Conv2d(1, self.inplanes, kernel_size=7, stride=2, padding=3, bias=False)
self.bn1 = nn.BatchNorm2d(self.inplanes)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)

同時在forward()前向傳播寫上:

x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)

然後再看到這個小方塊,它表示的就是剛剛所說的A Building Block,兩個Building Block串在一起,相當於這裏有四層卷積層。由於Resnet18有四個這樣的方塊,每一個方塊有四層卷積層,每一個方塊不同的地方就在於卷積通道分辨率和卷積核數量不同,一個簡單的想法就是定義一個A Building Block類,然後將2個A Building Block串在一起當成“一層”,然後在前向傳播中調用“四層”即可。
在這裏插入圖片描述
我們把把一個A Building Block定義爲BasicBlock,由於輸入輸出尺度等參數可能不同,所以在調用時需要傳參。比較需要注意的是恆等映射,其實在代碼實現就是在前向傳播之前保存輸入的所有參數identity,當進行兩次卷積後,就將輸出+identity得到新的輸出,在相加之前還得判斷是否需要先進行下采樣,保證輸入和輸出的參數是一樣的。

class BasicBlock(nn.Module):
    # OutChannal represents kernal size.
    def __init__(self, InChannal, OutChannal, stride=1, downsample=None):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(InChannal, OutChannal, kernel_size=3, stride=stride, bias=False, padding=1)
        self.bn1 = nn.BatchNorm2d(OutChannal)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv2d(OutChannal, OutChannal, kernel_size=3, bias=False, padding=1)
        self.bn2 = nn.BatchNorm2d(OutChannal)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        identity = x
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)
        if self.downsample is not None:
            identity = self.downsample(x)
        out += identity  # Resnet`s essence
        out = self.relu(out)

        return out

到這裏大家應該就逐漸清晰了,我們只需要定義兩個類,一個是BasicBlock類,另一個是ResNet的類。BasicBlock就是一個基礎的模塊給Resnet調用。在基礎模塊BasicBlock寫好之後,我們繼續完善Resnet。現在要做的事情就是把兩個BasicBlock拼接起來,形成剛剛的“小方塊”。
在這裏插入圖片描述
在ResNet類裏,定義了一個拼接函數,當需要下采樣的時候,也即是stride步長爲2時,進行下采樣,然後再將剩餘的Block拼接起來(根據block數量而定),使用了Sequential函數。由代碼可知,在每一個小方塊運行前(除了第一個),都會先進行下采樣。

def _make_layer(self, Basicblock, planes, block_num, stride=1):
        downsample = None
        # Change channal size
        if stride != 1 or self.inplanes != planes:
            downsample = nn.Sequential(
                nn.Conv2d(self.inplanes, planes, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(planes),
            )

        # Initial layers
        layers = []
        # 1.original layer, downsamples firstly.
        layers.append(Basicblock(self.inplanes, planes, stride, downsample))
        self.inplanes = planes
        # 2.add layer
        for i in range(1, block_num):
            layers.append(Basicblock(self.inplanes, planes))

        return nn.Sequential(*layers)

於是,resnet18中的由resnet組成的“小方塊”,就由上面的_make_layer函數封裝好了。

最後,你只需要在Resnet裏“搭積木”就好了。比如,resnet18要經過一個卷積層、BN層、ReLU層,Pooling層,然後是四個“小方塊”,最後是average pool、全連接層以及softmax。
先在__init__裏定義:

self.conv1 = nn.Conv2d(1, self.inplanes, kernel_size=7, stride=2, padding=3, bias=False)
self.bn1 = nn.BatchNorm2d(self.inplanes)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.layer1 = self._make_layer(Basicblock, 64, block_num=layers[0], stride=1)
self.layer2 = self._make_layer(Basicblock, 128, block_num=layers[1], stride=2)
self.layer3 = self._make_layer(Basicblock, 256, block_num=layers[2], stride=2)
self.layer4 = self._make_layer(Basicblock, 512, block_num=layers[3], stride=2)
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.fc = nn.Linear(512, num_classes)

再在forward前向傳播中寫下:

 def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)
        return x

然後我們定義函數來進行調用。這裏由於Resnet34也是採用了一樣的BasicBlock,所以34也能用。

def _resnet(block, layers, **kwargs):
    model = ResNet(block, layers, **kwargs)
    return model
def resnet18():
    return _resnet(BasicBlock, [2, 2, 2, 2])
def resnet34():
    return _resnet(BasicBlock, [3, 4, 6, 3])

最後,只需要調用函數:

net = resnet18()

一個基本的ResNet就這樣搭建好了,有興趣的小夥伴可以自己寫一個resnet50哈~

下面是完整的代碼:

# A BasicBlock has two convolution layer.
class BasicBlock(nn.Module):
    # OutChannal represents kernal size.
    def __init__(self, InChannal, OutChannal, stride=1, downsample=None):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(InChannal, OutChannal, kernel_size=3, stride=stride, bias=False, padding=1)
        self.bn1 = nn.BatchNorm2d(OutChannal)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv2d(OutChannal, OutChannal, kernel_size=3, bias=False, padding=1)
        self.bn2 = nn.BatchNorm2d(OutChannal)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        identity = x
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)
        if self.downsample is not None:
            identity = self.downsample(x)
        out += identity  # Resnet`s essence
        out = self.relu(out)

        return out


class ResNet(nn.Module):

    def __init__(self, Basicblock, layers, num_classes=Config.NUM_CLASSES):
        super(ResNet, self).__init__()

        self.inplanes = 64
        self.dilation = 1

        self.conv1 = nn.Conv2d(1, self.inplanes, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(self.inplanes)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(Basicblock, 64, block_num=layers[0], stride=1)
        self.layer2 = self._make_layer(Basicblock, 128, block_num=layers[1], stride=2)
        self.layer3 = self._make_layer(Basicblock, 256, block_num=layers[2], stride=2)
        self.layer4 = self._make_layer(Basicblock, 512, block_num=layers[3], stride=2)
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512, num_classes)

        # Judge layer`s type and initialize. Conv2d and BN initialization in a different way
        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
            elif isinstance(m, nn.BatchNorm2d):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)
        # Zero-initialize the last BN in each residual branch, it can improves performance
        for m in self.modules():
            if isinstance(m, BasicBlock):
                nn.init.constant_(m.bn2.weight, 0)

    def _make_layer(self, Basicblock, planes, block_num, stride=1):
        downsample = None
        # Change channal size
        if stride != 1 or self.inplanes != planes:
            downsample = nn.Sequential(
                nn.Conv2d(self.inplanes, planes, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(planes),
            )
        # Initial layers
        layers = []
        # 1.original layer, downsamples firstly.
        layers.append(Basicblock(self.inplanes, planes, stride, downsample))
        self.inplanes = planes
        # 2.add layer
        for i in range(1, block_num):
            layers.append(Basicblock(self.inplanes, planes))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)

        return x


def _resnet(block, layers, **kwargs):
    model = ResNet(block, layers, **kwargs)
    return model


def resnet18():
    return _resnet(BasicBlock, [2, 2, 2, 2])


def resnet34():
    return _resnet(BasicBlock, [3, 4, 6, 3])

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章