深度學習:深度複數網絡(Deep Complex Networks)-從論文到pytorch實現

摘要:實數網絡在圖像領域取得極大成功,但在音頻中,信號特徵大多數是複數,如頻譜等。簡單分離實部虛部,或者考慮幅度和相位角都丟失了複數原本的關係。論文按照複數計算的定義,設計了深度複數網絡,能對複數的輸入數據進行卷積、激活、批規範化等操作。在音頻信號的處理中,該網絡應該有極大的優勢。這裏對論文提出的幾種複數操作進行介紹,並給出簡單的pytorch實現方法。

雖然叫深度複數網絡,但裏面的操作實際上還是在實數空間進行的。但通過實數的層實現類似於複數計算的操作。

目錄

  1. 關於複數卷積操作
  2. 關於複數激活函數
  3. 關於複數Dropout
  4. 關於複數權重初始化
  5. 關於複數BatchNormalization
  6. 完整模型搭建

主要參考文獻

【1】“DEEP COMPLEX NETWORKS”

【2】論文作者給出的源碼地址,使用Theano後端的Keras實現:“https://github.com/ChihebTrabelsi/deep_complex_networks

【3】“https://github.com/wavefrontshaping/complexPyTorch” 給出了部分操作的Pytorch實現版本。

1. 關於複數卷積操作

複數卷積通過如下形式定義:
在這裏插入圖片描述
在具體實現中,可以使用下圖所示的簡單結構實現。
Alt

因此,利用pytorch的nn.Conv2D實現,嚴格遵守上面複數卷積的定義式:

class ComplexConv2d(Module):
    
    def __init__(self, input_channels, output_channels,
             kernel_sizes=3, stride=1, padding=0, dilation=0, groups=1, bias=True):
        super(ComplexConv2d, self).__init__()
        self.conv_real = Conv2d(input_channels, output_channels, kernel_size, stride, padding, dilation, groups, bias)
        self.conv_imag = Conv2d(input_channels, output_channels, kernel_size, stride, padding, dilation, groups, bias)
    
    def forward(self, input_real, input_imag):
        assert input_real.shape == input_imag.shape
        return self.conv_real(input_real) - self.conv_imag(input_imag), self.conv_imag(input_real) + self.conv_real(input_imag)

2. 關於複數激活函數

論文作者提出了一種複數激活函數——CReLU,同時又介紹了另外兩種複數激活函數——modReLU和zReLU。
在這裏插入圖片描述
在這裏插入圖片描述
在這裏插入圖片描述
複數激活函數需要滿足Cauchy-Riemann Equations才能進行復數微分操作,其中

  • modReLU不滿足;
  • zReLU在實部爲0,虛部大於0或者虛部爲0,實部大於0的時候不滿足,即在x和y的正半軸不滿足;
  • CReLU只在實部虛部同時大於零或同時小於零的時候滿足,即在第2、4象限不滿足;

以作者提出的CReLU的實現爲例:

from torch.nn.functional import relu

def complex_relu(input_real, input_imag):
    return relu(input_real), relu(input_imag)

3. 關於複數Dropout

複數Dropout個人感覺實部虛部需要同時置0,作者源碼中沒用到Dropout層。

所以【3】中的Dropout好像不太對。實現起來和普通的一樣,共享兩個Dropout層的參數即可。

4. 關於複數權重初始化

作者介紹了兩種初始化方法的複數形式:Glorot、He初始化。

如原文介紹的,初始化時需要對幅度和相位分別初始化。

利用Pytorch實現,直接在源碼上進行修改,_calculate_correct_fan()源碼中有。

def complex_kaiming_normal_(tensor_real, tensor_imag, a=0, mode='fan_in'):

    fan = _calculate_correct_fan(tensor_real, mode)
    s = 1. / fan
    rng = RandomState()
    modulus = rng.rayleigh(scale=s, size=tensor.shape)
    phase = rng.uniform(low=-np.pi, high=np.pi, size=tensor.shape)
    weight_real = modulus * np.cos(phase)
    weight_imag = modulus * np.sin(phase)
    weight = np.concatenate([weight_real, weight_imag], axis=-1)

    with torch.no_grad():
        return torch.tensor(weight)

上述計算過程參考【1】和【2】,但這種兩個張量的初始化不知道怎麼直接使用init這樣的形式,只能配合如下手動初始化方法食用。

import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np

# 第一一個卷積層,我們可以看到它的權值是隨機初始化的
w=torch.nn.Conv2d(2,2,3,padding=1)
print(w.weight)


# 第一種方法
print("1.使用另一個Conv層的權值")
q=torch.nn.Conv2d(2,2,3,padding=1) # 假設q代表一個訓練好的卷積層
print(q.weight) # 可以看到q的權重和w是不同的
w.weight=q.weight # 把一個Conv層的權重賦值給另一個Conv層
print(w.weight)

# 第二種方法
print("2.使用來自Tensor的權值")
ones=torch.Tensor(np.ones([2,2,3,3])) # 先創建一個自定義權值的Tensor,這裏爲了方便將所有權值設爲1
w.weight=torch.nn.Parameter(ones) # 把Tensor的值作爲權值賦值給Conv層,這裏需要先轉爲torch.nn.Parameter類型,否則將報錯
print(w.weight)

5. 關於複數BatchNormalization

首先肯定不能用常規的BN方法,否則實部和虛部的分佈就不能保證了。但正如常規BN方法,首先要對輸入進行0均值1方差的操作,只是方法有所不同。

通過下面的操作,可以確保輸出的均值爲0,協方差爲1,相關爲0。
在這裏插入圖片描述
在這裏插入圖片描述
同時BN中還有β\betaγ\gamma兩個參數。因此最終的BN結果如下。
在這裏插入圖片描述
核心的計算步驟及代碼實現見下一節完整實現過程,參考【3】。

6. 完整模型搭建

使用複數卷積、BN、激活函數搭建一個簡單的完整模型。

使用mnist數據集,用文中提到的方法生成虛部。

實際使用中音頻、光學信號可以直接有複數譜作爲輸入。

import matplotlib.pyplot as plt
import numpy as np

import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.nn import Module, Parameter, init
from torch.nn import Conv2d, Linear, BatchNorm2d
from torch.nn.functional import relu
from torchvision import datasets, transforms


def complex_relu(input_r, input_i):
    return relu(input_r), relu(input_i)

class ComplexConv2d(Module):

    def __init__(self,in_channels, out_channels, kernel_size=3, stride=1, padding = 0,
                 dilation=1, groups=1, bias=True):
        super(ComplexConv2d, self).__init__()
        self.conv_r = Conv2d(in_channels, out_channels, kernel_size, stride, padding, dilation, groups, bias)
        self.conv_i = Conv2d(in_channels, out_channels, kernel_size, stride, padding, dilation, groups, bias)

    def forward(self,input_r, input_i):
        assert(input_r.size() == input_i.size())
        return self.conv_r(input_r)-self.conv_i(input_i), self.conv_r(input_i)+self.conv_i(input_r)

class ComplexLinear(Module):

    def __init__(self, in_features, out_features):
        super(ComplexLinear, self).__init__()
        self.fc_r = Linear(in_features, out_features)
        self.fc_i = Linear(in_features, out_features)

    def forward(self,input_r, input_i):
        return self.fc_r(input_r)-self.fc_i(input_i), self.fc_r(input_i)+self.fc_i(input_r)

class _ComplexBatchNorm(Module):

    def __init__(self, num_features, eps=1e-5, momentum=0.1, affine=True,
                 track_running_stats=True):
        super(_ComplexBatchNorm, self).__init__()
        self.num_features = num_features
        self.eps = eps
        self.momentum = momentum
        self.affine = affine
        self.track_running_stats = track_running_stats
        if self.affine:
            self.weight = Parameter(torch.Tensor(num_features,3))
            self.bias = Parameter(torch.Tensor(num_features,2))
        else:
            self.register_parameter('weight', None)
            self.register_parameter('bias', None)
        if self.track_running_stats:
            self.register_buffer('running_mean', torch.zeros(num_features,2))
            self.register_buffer('running_covar', torch.zeros(num_features,3))
            self.running_covar[:,0] = 1.4142135623730951
            self.running_covar[:,1] = 1.4142135623730951
            self.register_buffer('num_batches_tracked', torch.tensor(0, dtype=torch.long))
        else:
            self.register_parameter('running_mean', None)
            self.register_parameter('running_covar', None)
            self.register_parameter('num_batches_tracked', None)
        self.reset_parameters()

    def reset_running_stats(self):
        if self.track_running_stats:
            self.running_mean.zero_()
            self.running_covar.zero_()
            self.running_covar[:,0] = 1.4142135623730951
            self.running_covar[:,1] = 1.4142135623730951
            self.num_batches_tracked.zero_()

    def reset_parameters(self):
        self.reset_running_stats()
        if self.affine:
            init.constant_(self.weight[:,:2],1.4142135623730951)
            init.zeros_(self.weight[:,2])
            init.zeros_(self.bias)

class ComplexBatchNorm2d(_ComplexBatchNorm):

    def forward(self, input_r, input_i):
        assert(input_r.size() == input_i.size())
        assert(len(input_r.shape) == 4)
        exponential_average_factor = 0.0


        if self.training and self.track_running_stats:
            if self.num_batches_tracked is not None:
                self.num_batches_tracked += 1
                if self.momentum is None:  # use cumulative moving average
                    exponential_average_factor = 1.0 / float(self.num_batches_tracked)
                else:  # use exponential moving average
                    exponential_average_factor = self.momentum


        if self.training:

            # calculate mean of real and imaginary part
            mean_r = input_r.mean([0, 2, 3])
            mean_i = input_i.mean([0, 2, 3])


            mean = torch.stack((mean_r,mean_i),dim=1)

            # update running mean
            with torch.no_grad():
                self.running_mean = exponential_average_factor * mean\
                    + (1 - exponential_average_factor) * self.running_mean

            input_r = input_r-mean_r[None, :, None, None]
            input_i = input_i-mean_i[None, :, None, None]

            # Elements of the covariance matrix (biased for train)
            n = input_r.numel() / input_r.size(1)
            Crr = 1./n*input_r.pow(2).sum(dim=[0,2,3])+self.eps
            Cii = 1./n*input_i.pow(2).sum(dim=[0,2,3])+self.eps
            Cri = (input_r.mul(input_i)).mean(dim=[0,2,3])

            with torch.no_grad():
                self.running_covar[:,0] = exponential_average_factor * Crr * n / (n - 1)\
                    + (1 - exponential_average_factor) * self.running_covar[:,0]

                self.running_covar[:,1] = exponential_average_factor * Cii * n / (n - 1)\
                    + (1 - exponential_average_factor) * self.running_covar[:,1]

                self.running_covar[:,2] = exponential_average_factor * Cri * n / (n - 1)\
                    + (1 - exponential_average_factor) * self.running_covar[:,2]

        else:
            mean = self.running_mean
            Crr = self.running_covar[:,0]+self.eps
            Cii = self.running_covar[:,1]+self.eps
            Cri = self.running_covar[:,2]#+self.eps

            input_r = input_r-mean[None,:,0,None,None]
            input_i = input_i-mean[None,:,1,None,None]

        # calculate the inverse square root the covariance matrix
        det = Crr*Cii-Cri.pow(2)
        s = torch.sqrt(det)
        t = torch.sqrt(Cii+Crr + 2 * s)
        inverse_st = 1.0 / (s * t)
        Rrr = (Cii + s) * inverse_st
        Rii = (Crr + s) * inverse_st
        Rri = -Cri * inverse_st

        input_r, input_i = Rrr[None,:,None,None]*input_r+Rri[None,:,None,None]*input_i, \
                           Rii[None,:,None,None]*input_i+Rri[None,:,None,None]*input_r

        if self.affine:
            input_r, input_i = self.weight[None,:,0,None,None]*input_r+self.weight[None,:,2,None,None]*input_i+\
                               self.bias[None,:,0,None,None], \
                               self.weight[None,:,2,None,None]*input_r+self.weight[None,:,1,None,None]*input_i+\
                               self.bias[None,:,1,None,None]

        return input_r, input_i

class ComplexNet(nn.Module):
    
    def __init__(self):
        super(ComplexNet, self).__init__()
        self.conv1 = ComplexConv2d(1, 20, 5, 2)
        self.bn  = ComplexBatchNorm2d(20)
        self.conv2 = ComplexConv2d(20, 50, 5, 2)
        self.fc1 = ComplexLinear(4*4*50, 500)
        self.fc2 = ComplexLinear(500, 10)
        
        self.bn4imag = BatchNorm2d(1)
        self.conv4imag = Conv2d(1, 1, 3, 1, padding=1)
             
    def forward(self,x):
        xr = x
        # imaginary part BN-ReLU-Conv-BN-ReLU-Conv as shown in paper
        xi = self.bn4imag(xr)
        xi = relu(xi)
        xi = self.conv4imag(xi)
        
        # flow into complex net
        xr,xi = self.conv1(xr,xi)
        xr,xi = complex_relu(xr,xi)
        
        xr,xi = self.bn(xr,xi)
        xr,xi = self.conv2(xr,xi)
        xr,xi = complex_relu(xr,xi)
#         print(xr.shape)
        xr = xr.reshape(-1, 4*4*50)
        xi = xi.reshape(-1, 4*4*50)
        xr,xi = self.fc1(xr,xi)
        xr,xi = complex_relu(xr,xi)
        xr,xi = self.fc2(xr,xi)
        # take the absolute value as output
        x = torch.sqrt(torch.pow(xr,2)+torch.pow(xi,2))
        return F.log_softmax(x, dim=1)
    
batch_size = 64
trans = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (1.0,))])
train_set = datasets.MNIST('../data', train=True, transform=trans, download=True)
test_set = datasets.MNIST('../data', train=False, transform=trans, download=True)
train_loader = torch.utils.data.DataLoader(train_set, batch_size= batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_set, batch_size= batch_size, shuffle=True)
    
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

model = ComplexNet().to(device)
print(model)

optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

# train steps
train_loss = []
for epoch in range(50):
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        train_loss.append(loss.item())
        if batch_idx % 100 == 0:
            print('Train Epoch: {:3} [{:6}/{:6} ({:3.0f}%)]\tLoss: {:.6f}'.format(
                epoch,
                batch_idx * len(data), 
                len(train_loader.dataset),
                100. * batch_idx / len(train_loader), 
                loss.item())
            )
            
plt.plot(train_loss)
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章