嗨,小夥伴們,今天讓我們來了解一下Resnet的原理以及Resnet18網絡在Pytorch的實現。
原理
Resnet想必大家都很熟悉了,它的中文名爲殘差網絡,是由何愷明大佬提出的一種網絡結構。
在論文的開篇,提出了一個問題,神經網絡越深,性能就越好嗎?答案是否定的,如越深的神經網絡可能造成著名的梯度消失、爆炸問題,但這個問題已經通過Batch normalization解決。
但更深的網絡呢?
會出現網絡層數增加但loss不降反升的情況(如下圖),而且這種下降並不是通過過擬合引起的。
爲了能夠實現更深的網絡同時不出現退化的現象,論文提出了一個假設:當把淺層網絡特徵傳到深層網絡時,深層網絡的效果一定會比淺層網絡好(至少不會差),只要保證輸出的特徵參數一致,那麼就可以使用Identity mapping(恆等映射),來傳遞特徵。
那麼,恆等映射是一個什麼樣的東西呢?如下圖,非常簡單,其實就是把淺層網絡的輸出X加到兩層或三層卷積層後(較深層)的輸出,在這個過程中,兩層神經網絡的輸出參數個數是一樣的,這也保證了不同深度的網絡參數可以進行直接相加。
我們把上面的Block,稱之爲A Building Block,在論文中,提出了resnet18,resnet34,resnet50,resnet101,resnet152,resnet後面的尾數分別代表了網絡的層數。在18、34的網絡裏,使用到的A Building Block是上圖左邊的形式。而在50、101、152網絡裏,使用的是右邊的形式。
提出的五個網絡結構如下所示:
這個圖是怎麼看的呢?我們以18層網絡Resnet18來看。
Resnet18要經過一個卷積層、Pooling層,然後是四個“小方塊”,一個方塊由兩個A Build Block組成,一個A Build Block又由兩個卷積層組成,四個“小方塊”即16層,最後是average pool、全連接層。由於Pooling層不需要參數學習,故去除Pooling層,整個resnet18網絡由18層組成。
源碼
爲了更好地幫助小夥伴理解,我把resnet在pytorch的源碼縮減了,目前這個代碼只實現了resnet18和resnet34。
如有需要可自行查看pytorch的resnet源碼。
一開始圖像的輸入是224*224,經過第一次卷積之後,由於stride爲2,output輸出變爲112x112,然後經過一個max pool,size又縮小一半變爲56x56。這兩個都是正常的CNN操作。
不難在__init__定義以下組件,
self.conv1 = nn.Conv2d(1, self.inplanes, kernel_size=7, stride=2, padding=3, bias=False)
self.bn1 = nn.BatchNorm2d(self.inplanes)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
同時在forward()前向傳播寫上:
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
然後再看到這個小方塊,它表示的就是剛剛所說的A Building Block,兩個Building Block串在一起,相當於這裏有四層卷積層。由於Resnet18有四個這樣的方塊,每一個方塊有四層卷積層,每一個方塊不同的地方就在於卷積通道分辨率和卷積核數量不同,一個簡單的想法就是定義一個A Building Block類,然後將2個A Building Block串在一起當成“一層”,然後在前向傳播中調用“四層”即可。
我們把把一個A Building Block定義爲BasicBlock,由於輸入輸出尺度等參數可能不同,所以在調用時需要傳參。比較需要注意的是恆等映射,其實在代碼實現就是在前向傳播之前保存輸入的所有參數identity,當進行兩次卷積後,就將輸出+identity得到新的輸出,在相加之前還得判斷是否需要先進行下采樣,保證輸入和輸出的參數是一樣的。
class BasicBlock(nn.Module):
# OutChannal represents kernal size.
def __init__(self, InChannal, OutChannal, stride=1, downsample=None):
super(BasicBlock, self).__init__()
self.conv1 = nn.Conv2d(InChannal, OutChannal, kernel_size=3, stride=stride, bias=False, padding=1)
self.bn1 = nn.BatchNorm2d(OutChannal)
self.relu = nn.ReLU(inplace=True)
self.conv2 = nn.Conv2d(OutChannal, OutChannal, kernel_size=3, bias=False, padding=1)
self.bn2 = nn.BatchNorm2d(OutChannal)
self.downsample = downsample
self.stride = stride
def forward(self, x):
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity # Resnet`s essence
out = self.relu(out)
return out
到這裏大家應該就逐漸清晰了,我們只需要定義兩個類,一個是BasicBlock類,另一個是ResNet的類。BasicBlock就是一個基礎的模塊給Resnet調用。在基礎模塊BasicBlock寫好之後,我們繼續完善Resnet。現在要做的事情就是把兩個BasicBlock拼接起來,形成剛剛的“小方塊”。
在ResNet類裏,定義了一個拼接函數,當需要下采樣的時候,也即是stride步長爲2時,進行下采樣,然後再將剩餘的Block拼接起來(根據block數量而定),使用了Sequential函數。由代碼可知,在每一個小方塊運行前(除了第一個),都會先進行下采樣。
def _make_layer(self, Basicblock, planes, block_num, stride=1):
downsample = None
# Change channal size
if stride != 1 or self.inplanes != planes:
downsample = nn.Sequential(
nn.Conv2d(self.inplanes, planes, kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(planes),
)
# Initial layers
layers = []
# 1.original layer, downsamples firstly.
layers.append(Basicblock(self.inplanes, planes, stride, downsample))
self.inplanes = planes
# 2.add layer
for i in range(1, block_num):
layers.append(Basicblock(self.inplanes, planes))
return nn.Sequential(*layers)
於是,resnet18中的由resnet組成的“小方塊”,就由上面的_make_layer函數封裝好了。
最後,你只需要在Resnet裏“搭積木”就好了。比如,resnet18要經過一個卷積層、BN層、ReLU層,Pooling層,然後是四個“小方塊”,最後是average pool、全連接層以及softmax。
先在__init__裏定義:
self.conv1 = nn.Conv2d(1, self.inplanes, kernel_size=7, stride=2, padding=3, bias=False)
self.bn1 = nn.BatchNorm2d(self.inplanes)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.layer1 = self._make_layer(Basicblock, 64, block_num=layers[0], stride=1)
self.layer2 = self._make_layer(Basicblock, 128, block_num=layers[1], stride=2)
self.layer3 = self._make_layer(Basicblock, 256, block_num=layers[2], stride=2)
self.layer4 = self._make_layer(Basicblock, 512, block_num=layers[3], stride=2)
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.fc = nn.Linear(512, num_classes)
再在forward前向傳播中寫下:
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.fc(x)
return x
然後我們定義函數來進行調用。這裏由於Resnet34也是採用了一樣的BasicBlock,所以34也能用。
def _resnet(block, layers, **kwargs):
model = ResNet(block, layers, **kwargs)
return model
def resnet18():
return _resnet(BasicBlock, [2, 2, 2, 2])
def resnet34():
return _resnet(BasicBlock, [3, 4, 6, 3])
最後,只需要調用函數:
net = resnet18()
一個基本的ResNet就這樣搭建好了,有興趣的小夥伴可以自己寫一個resnet50哈~
下面是完整的代碼:
# A BasicBlock has two convolution layer.
class BasicBlock(nn.Module):
# OutChannal represents kernal size.
def __init__(self, InChannal, OutChannal, stride=1, downsample=None):
super(BasicBlock, self).__init__()
self.conv1 = nn.Conv2d(InChannal, OutChannal, kernel_size=3, stride=stride, bias=False, padding=1)
self.bn1 = nn.BatchNorm2d(OutChannal)
self.relu = nn.ReLU(inplace=True)
self.conv2 = nn.Conv2d(OutChannal, OutChannal, kernel_size=3, bias=False, padding=1)
self.bn2 = nn.BatchNorm2d(OutChannal)
self.downsample = downsample
self.stride = stride
def forward(self, x):
identity = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)
out = self.conv2(out)
out = self.bn2(out)
if self.downsample is not None:
identity = self.downsample(x)
out += identity # Resnet`s essence
out = self.relu(out)
return out
class ResNet(nn.Module):
def __init__(self, Basicblock, layers, num_classes=Config.NUM_CLASSES):
super(ResNet, self).__init__()
self.inplanes = 64
self.dilation = 1
self.conv1 = nn.Conv2d(1, self.inplanes, kernel_size=7, stride=2, padding=3, bias=False)
self.bn1 = nn.BatchNorm2d(self.inplanes)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
self.layer1 = self._make_layer(Basicblock, 64, block_num=layers[0], stride=1)
self.layer2 = self._make_layer(Basicblock, 128, block_num=layers[1], stride=2)
self.layer3 = self._make_layer(Basicblock, 256, block_num=layers[2], stride=2)
self.layer4 = self._make_layer(Basicblock, 512, block_num=layers[3], stride=2)
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.fc = nn.Linear(512, num_classes)
# Judge layer`s type and initialize. Conv2d and BN initialization in a different way
for m in self.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')
elif isinstance(m, nn.BatchNorm2d):
nn.init.constant_(m.weight, 1)
nn.init.constant_(m.bias, 0)
# Zero-initialize the last BN in each residual branch, it can improves performance
for m in self.modules():
if isinstance(m, BasicBlock):
nn.init.constant_(m.bn2.weight, 0)
def _make_layer(self, Basicblock, planes, block_num, stride=1):
downsample = None
# Change channal size
if stride != 1 or self.inplanes != planes:
downsample = nn.Sequential(
nn.Conv2d(self.inplanes, planes, kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(planes),
)
# Initial layers
layers = []
# 1.original layer, downsamples firstly.
layers.append(Basicblock(self.inplanes, planes, stride, downsample))
self.inplanes = planes
# 2.add layer
for i in range(1, block_num):
layers.append(Basicblock(self.inplanes, planes))
return nn.Sequential(*layers)
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.fc(x)
return x
def _resnet(block, layers, **kwargs):
model = ResNet(block, layers, **kwargs)
return model
def resnet18():
return _resnet(BasicBlock, [2, 2, 2, 2])
def resnet34():
return _resnet(BasicBlock, [3, 4, 6, 3])