Squeeze-and-Excitation Networks
摘要
卷積神經網絡建立在卷積運算的基礎上,通過融合局部感受野內的空間信息和通道信息來提取信息特徵。爲了提高網絡的表示能力,許多現有的工作已經顯示出增強空間編碼的好處。在這項工作中,我們專注於通道,並提出了一種新穎的架構單元,我們稱之爲“Squeeze-and-Excitation”(SE)塊,通過顯式地建模通道之間的相互依賴關係,自適應地重新校準通道式的特徵響應。通過將這些塊堆疊在一起,我們證明了我們可以構建SENet架構,在具有挑戰性的數據集中可以進行泛化地非常好。關鍵的是,我們發現SE塊以微小的計算成本爲現有的最先進的深層架構產生了顯著的性能改進。SENets是我們ILSVRC 2017分類提交的基礎,它贏得了第一名,並將top-5
錯誤率顯著減少到2.251%,相對於2016年的獲勝成績取得了∼25%的相對改進。
第一點:SE模塊
其實簡單來說:通過池化層和卷積操作,作爲每個通道上的權重,相乘。
SE塊是一個計算單元,它可以建立在轉換FTR映射輸入到特徵映射上。 在下面的表示法中,我們將FTR看作一個卷積算子,並使用來表示所學習的一組濾波器核,其中vc指的是c-th濾波器的參數。 然後,我們可以將輸出寫入,其 中
是一個2D空間核,表示一個vc的單通道,作用於x的相應通道。
Squeeze:全局信息嵌入(其實就是個池化操作)
Excitation:自適應重新校正(FC(WZ)之後時候激活函數)
特徵映射和對應通道的乘積
第二點:模型應用
第三點:模型代碼
from torch import nn
class SELayer(nn.Module):
def __init__(self, channel, reduction=16):
super(SELayer, self).__init__()
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.fc = nn.Sequential(
nn.Linear(channel, channel // reduction, bias=False),
nn.ReLU(inplace=True),
nn.Linear(channel // reduction, channel, bias=False),
nn.Sigmoid()
)
def forward(self, x):
b, c, _, _ = x.size()
y = self.avg_pool(x).view(b, c)
y = self.fc(y).view(b, c, 1, 1)
#print("x.shape:",x.shape)
#print("y.expand_as(x).shape:",y.expand_as(x).shape)
return x * y.expand_as(x)
# #d打印網絡結構及參數和輸出形狀
from torchsummary import summary
net = SELayer(64)
summary(net, input_size=(64,128,128)) #summary(net,(3,250,250))
Selective Kernel Networks
摘要
在標準卷積神經網絡(CNN)中,每層中人工神經元的感受野被設計成共享相同的大小。在神經科學界衆所周知,視覺皮層神經元的感受野大小受到刺激的調節,這在構建CNN時很少被考慮。我們在CNN中提出了一種動態選擇機制,允許每個神經元根據輸入信息的多個尺度自適應地調整其感受野大小。設計了一個名爲Selective Kernel(SK)單元的構建塊,其中使用由這些分支中的信息引導的softmax attention來融合具有不同卷積核大小的多個分支。對這些分支的不同關注產生了融合層中神經元的有效感受野的不同大小。多個SK單元被堆疊到稱爲Selective Kernel(SKNets)的深度網絡中。在ImageNet和CIFAR基準測試中,我們憑經驗證明SKNet在模型複雜度較低的情況下優於現有的最先進架構。詳細分析表明,SKNet中的神經元可以捕獲具有不同尺度的目標物體,從而驗證神經元根據輸入自適應地調整其感受野尺寸的能力。
第一點:SK塊
爲了使神經元能夠自適應地調整其感受野尺寸,我們在具有不同卷積核大小的多個卷積核中提出了自動選擇操作SK卷積。具體來說,我們通過三個操作實現SK卷積 - Split,Fuse和Select,如圖1所示,其中顯示了一個雙分支的情況。因此,在此示例中,只有兩個卷積核具有不同的卷積核大小,但很容易擴展到多個分支的情況。
均由高效的分組/深度卷積,批量標準化和ReLU函數組成。
d是全連接層時候神經元的個數。
好像就是一個softmax函數
第二點:模型代碼
import torch
from torch import nn
class SKConv(nn.Module):
def __init__(self, features, WH, M, G, r, stride=1, L=32):
""" Constructor
Args:
features: input channel dimensionality.
WH: input spatial dimensionality, used for GAP kernel size.
M: the number of branchs. M=2 分支數 之後可以是3*3 5*5 7*7卷積
G: num of convolution groups. G = 8
r: the radio for compute d, the length of z. r =2
stride: stride, default 1.
L: the minimum dim of the vector z in paper, default 32.
"""
super(SKConv, self).__init__()
d = max(int(features / r), L)
self.M = M
self.features = features
self.convs = nn.ModuleList([])
#分組卷積 並控制特徵圖的大小通道數不變
for i in range(M):
self.convs.append(nn.Sequential(
nn.Conv2d(features, features, kernel_size=3 + i * 2, stride=stride, padding=1 + i, groups=G),
nn.BatchNorm2d(features),
nn.ReLU(inplace=False)
))
# self.gap = nn.AvgPool2d(int(WH/stride))
#全連接層 d = max(int(features / r), L) r = 2 L = 32
self.fc = nn.Linear(features, d)
#全連接層 分開 再經過Softmax層
self.fcs = nn.ModuleList([])
for i in range(M):
self.fcs.append(
nn.Linear(d, features)
)
self.softmax = nn.Softmax(dim=1)
def forward(self, x):
for i, conv in enumerate(self.convs):
fea = conv(x).unsqueeze_(dim=1)
if i == 0:
feas = fea
else:
feas = torch.cat([feas, fea], dim=1)
fea_U = torch.sum(feas, dim=1)
# fea_s = self.gap(fea_U).squeeze_()
fea_s = fea_U.mean(-1).mean(-1)
fea_z = self.fc(fea_s)
for i, fc in enumerate(self.fcs):
vector = fc(fea_z).unsqueeze_(dim=1)
if i == 0:
attention_vectors = vector
else:
attention_vectors = torch.cat([attention_vectors, vector], dim=1)
attention_vectors = self.softmax(attention_vectors)
attention_vectors = attention_vectors.unsqueeze(-1).unsqueeze(-1)
fea_v = (feas * attention_vectors).sum(dim=1)
return fea_v
class SKUnit(nn.Module):
def __init__(self, in_features, out_features, WH, M, G, r, mid_features=None, stride=1, L=32):
""" Constructor
Args:
in_features: input channel dimensionality.
out_features: output channel dimensionality.
WH: input spatial dimensionality, used for GAP kernel size.
M: the number of branchs.
G: num of convolution groups.
r: the radio for compute d, the length of z.
mid_features: the channle dim of the middle conv with stride not 1, default out_features/2.
stride: stride.
L: the minimum dim of the vector z in paper.
"""
super(SKUnit, self).__init__()
if mid_features is None:
mid_features = int(out_features / 2)
self.feas = nn.Sequential(
nn.Conv2d(in_features, mid_features, 1, stride=1),
nn.BatchNorm2d(mid_features),
SKConv(mid_features, WH, M, G, r, stride=stride, L=L),
nn.BatchNorm2d(mid_features),
nn.Conv2d(mid_features, out_features, 1, stride=1),
nn.BatchNorm2d(out_features)
)
if in_features == out_features: # when dim not change, in could be added diectly to out
self.shortcut = nn.Sequential()
else: # when dim not change, in should also change dim to be added to out
self.shortcut = nn.Sequential(
nn.Conv2d(in_features, out_features, 1, stride=stride),
nn.BatchNorm2d(out_features)
)
def forward(self, x):
fea = self.feas(x)
return fea + self.shortcut(x)
class SKNet(nn.Module):
def __init__(self, class_num):
super(SKNet, self).__init__()
self.basic_conv = nn.Sequential(
nn.Conv2d(3, 64, 3, padding=1),
nn.BatchNorm2d(64)
) # 32x32
self.stage_1 = nn.Sequential(
SKUnit(64, 256, 32, 2, 8, 2, stride=2),
nn.ReLU(),
SKUnit(256, 256, 32, 2, 8, 2),
nn.ReLU(),
SKUnit(256, 256, 32, 2, 8, 2),
nn.ReLU()
) # 32x32
self.stage_2 = nn.Sequential(
SKUnit(256, 512, 32, 2, 8, 2, stride=2),
nn.ReLU(),
SKUnit(512, 512, 32, 2, 8, 2),
nn.ReLU(),
SKUnit(512, 512, 32, 2, 8, 2),
nn.ReLU()
) # 16x16
self.stage_3 = nn.Sequential(
SKUnit(512, 1024, 32, 2, 8, 2, stride=2),
nn.ReLU(),
SKUnit(1024, 1024, 32, 2, 8, 2),
nn.ReLU(),
SKUnit(1024, 1024, 32, 2, 8, 2),
nn.ReLU()
) # 8x8
self.pool = nn.AvgPool2d(8)
self.classifier = nn.Sequential(
nn.Linear(1024, class_num),
# nn.Softmax(dim=1)
)
def forward(self, x):
fea = self.basic_conv(x)
fea = self.stage_1(fea)
fea = self.stage_2(fea)
fea = self.stage_3(fea)
fea = self.pool(fea)
fea = torch.squeeze(fea)
fea = self.classifier(fea)
return fea
from torchsummary import summary
# #d打印網絡結構及參數和輸出形狀
net = SKNet(100)
summary(net, input_size=(3, 112, 112)) #summary(net,(3,250,250))
好像現在看的還不是太懂,感覺代碼挺好的,代碼還需要繼續看 總感覺上面參考的代碼怪怪的
import torch
from torch import nn
bn_momentum=0.1
class SKConv1 (nn.Module):
def __init__(self, features,out_channel, M, G, r, stride=1, L=32):
super (SKConv1, self).__init__ ()
d = max (int (features / r), L)
self.M = M
self.features = features
self.outchannel=out_channel
self.conv1=nn.Sequential (
nn.Conv2d (features, out_channel, kernel_size=3, stride=stride, padding=1,groups=G,bias=False),
nn.BatchNorm2d (out_channel,momentum=bn_momentum),
nn.ReLU (inplace=True)
)
self.conv2=nn.Sequential (
nn.Conv2d (features, out_channel, kernel_size=3, stride=stride, padding=2,groups=G,bias=False,dilation=2),
nn.BatchNorm2d (out_channel,momentum=bn_momentum),
nn.ReLU (inplace=True)
)
# self.gap = nn.AvgPool2d (int(WH/stride))
self.gap=nn.AdaptiveAvgPool2d(1)
self.fc1 = nn.Sequential (
nn.Conv2d (out_channel, d, 1, padding=0,bias=False),
# nn.BatchNorm2d (d),
nn.ReLU (inplace=True)
)
self.fc2=nn.Sequential (
nn.Conv2d (d, out_channel*2, 1, padding=0,bias=False),
# nn.BatchNorm2d (256),
nn.ReLU (inplace=True)
)
self.softmax = nn.Softmax (dim=1)
def forward(self, x):
fea1=self.conv1(x)
fea2=self.conv2(x)
fea_U = fea1+fea2
fea_s = self.gap (fea_U)
fea_z = self.fc1 (fea_s)
fea_z=self.fc2(fea_z)
fea_z=fea_z.view(fea_z.shape[0],2,-1,fea_z.shape[-1])
attention_vectors = self.softmax (fea_z)
attention_vectors1,attention_vectors2=torch.split(attention_vectors,1,dim=1)
attention_vectors1=attention_vectors1.reshape(attention_vectors1.shape[0],self.outchannel,-1,attention_vectors1.shape[-1])
attention_vectors2=attention_vectors2.reshape(attention_vectors2.shape[0],self.outchannel,-1,attention_vectors2.shape[-1])
out1 = attention_vectors1*fea1
out2 = attention_vectors2*fea2
out=out1+out2
return out
class SKConv2(nn.Module):
def __init__(self, channel, reduction):
super(SKConv2, self).__init__()
self.conv1 = nn.Conv2d(channel, channel, 3, padding=1, bias=True)
self.conv2 = nn.Conv2d(channel, channel, 3, padding=2, dilation=2, bias=True)
self.pool = nn.AdaptiveAvgPool2d(1)
self.conv_se = nn.Sequential(
nn.Conv2d(channel, channel//reduction, 1, padding=0, bias=True),
nn.ReLU(inplace=True)
)
self.conv_ex = nn.Sequential(nn.Conv2d(channel//reduction, channel, 1, padding=0, bias=True))
self.softmax = nn.Softmax(dim=1)
def forward(self, x):
conv1 = self.conv1(x).unsqueeze(dim=1)
conv2 = self.conv2(x).unsqueeze(dim=1)
features = torch.cat([conv1, conv2], dim=1)
U = torch.sum(features, dim=1)
S = self.pool(U)
Z = self.conv_se(S)
attention_vector = torch.cat([self.conv_ex(Z).unsqueeze(dim=1), self.conv_ex(Z).unsqueeze(dim=1)], dim=1)
attention_vector = self.softmax(attention_vector)
V = (features * attention_vector).sum(dim=1)
return V
class SKConv (nn.Module):
def __init__(self, features, WH, M, G, r, stride=1, L=32):
""" Constructor
Args:
features: input channel dimensionality.
WH: input spatial dimensionality, used for GAP kernel size.
M: the number of branchs.
G: num of convolution groups.
r: the radio for compute d, the length of z.
stride: stride, default 1.
L: the minimum dim of the vector z in paper, default 32.
"""
super (SKConv, self).__init__ ()
d = max (int (features / r), L)
self.M = M
self.features = features
self.convs = nn.ModuleList ([])
for i in range (M):
self.convs.append (nn.Sequential (
nn.Conv2d (features, features, kernel_size=3 + i * 2, stride=stride, padding=1 + i, groups=G),
nn.BatchNorm2d (features),
nn.ReLU (inplace=False)
))
self.gap = nn.AvgPool2d (int (WH / stride))
self.fc=nn.Conv2d (features, d, 1, padding=0, bias=False),
self.relu=nn.ReLU (inplace=True)
# self.fc = nn.Linear (features, d)
self.fc=nn.Sequential(
nn.Conv2d (features, d, 1, padding=0, bias=False),
nn.BatchNorm2d (d),
nn.ReLU(inplace=True)
)
self.fcs = nn.ModuleList ([])
for i in range (M):
self.fcs.append (
nn.Sequential(
# nn.Linear (d, features)
nn.Conv2d (d, features, 1, padding=0, bias=False),
nn.BatchNorm2d (features),
nn.ReLU(inplace=True))
)
self.softmax = nn.Softmax (dim=1)
def forward(self, x):
for i, conv in enumerate (self.convs):
fea = conv (x).unsqueeze_ (dim=1)
if i == 0:
feas = fea
else:
feas = torch.cat ([feas, fea], dim=1)
fea_U = torch.sum (feas, dim=1)
fea_s = self.gap (fea_U).squeeze_ ()
print(fea_s.shape)
fea_s_in=fea_s.view(fea_s.shape[0],fea_s.shape[1],1,1)
fea_z = self.fc (fea_s_in)
# fea_z = self.relu(self.fc (fea_s))
for i, fc in enumerate (self.fcs):
vector = fc (fea_z).unsqueeze_ (dim=1)
if i == 0:
attention_vectors = vector
else:
attention_vectors = torch.cat ([attention_vectors, vector], dim=1)
attention_vectors = self.softmax (attention_vectors)
attention_vectors = attention_vectors#.unsqueeze (-1)#.unsqueeze (-1)
print(attention_vectors.shape)
fea_v = (feas * attention_vectors).sum (dim=1)
return fea_v
class SKUnit (nn.Module):
def __init__(self, in_features, out_features, WH, M, G, r, mid_features=None, stride=1, L=32):
""" Constructor
Args:
in_features: input channel dimensionality.
out_features: output channel dimensionality.
WH: input spatial dimensionality, used for GAP kernel size.
M: the number of branchs.
G: num of convolution groups.
r: the radio for compute d, the length of z.
mid_features: the channle dim of the middle conv with stride not 1, default out_features/2.
stride: stride.
L: the minimum dim of the vector z in paper.
"""
super (SKUnit, self).__init__ ()
if mid_features is None:
mid_features = int (out_features / 2)
self.feas = nn.Sequential (
nn.Conv2d (in_features, mid_features, 1, stride=1),
nn.BatchNorm2d (mid_features),
SKConv (mid_features, WH, M, G, r, stride=stride, L=L),
nn.BatchNorm2d (mid_features),
nn.Conv2d (mid_features, out_features, 1, stride=1),
nn.BatchNorm2d (out_features)
)
if in_features == out_features: # when dim not change, in could be added diectly to out
self.shortcut = nn.Sequential ()
else: # when dim not change, in should also change dim to be added to out
self.shortcut = nn.Sequential (
nn.Conv2d (in_features, out_features, 1, stride=stride),
nn.BatchNorm2d (out_features)
)
def forward(self, x):
fea = self.feas (x)
return fea + self.shortcut (x)