本文參考–PyTorch官方教程中文版鏈接:http://pytorch123.com/FirstSection/PyTorchIntro/
Pytorch中文文檔:https://pytorch-cn.readthedocs.io/zh/latest/package_references/Tensor/
PyTorch英文文檔:https://pytorch.org/docs/stable/tensors.html
《深度學習之PyTorch物體檢測實戰》
第一次接觸PyTorch,網上很難找到最新版本的教程,先從它的官方資料入手吧!
目錄
默認導入模塊:
import os
import json
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch import optim
import torchvision
from torchvision import models
from torch.utils.data import Dataset
from torchvision import transforms
from torch.utils.data import DataLoader
import visdom
# from tensorboardX import SummaryWriter
from torch.utils.tensorboard import SummaryWriter
全連接層
nn.Linear(in_features, out_features, bias=True)
>>> linear = nn.Linear(784, 10)
>>> input = torch.randn(4, 784)
>>> output = linear(input)
>>> output.shape
torch.Size([4, 10])
卷積層
nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0,
dilation=1, groups=1, bias=True, padding_mode='zeros')
- dilation:空洞卷積,當大於1的時候可以增大感受野,同時保持特徵圖的尺寸
- groups:可實現組卷積,即在卷積操作時不是逐點卷積,而是將輸入通道範圍分爲多個組,稀疏連接達到降低計算量的目的
通過.weight
和.bias
查看卷積核的權重與偏置
>>> conv = nn.Conv2d(1, 1, 3, 1, 1)
>>> conv.weight.shape
torch.Size([1, 1, 3, 3])
>>> conv.bias.shape
torch.Size([1])
輸入特徵圖必須寫爲的形式
>>> input = torch.randn(1, 1, 5, 5)
>>> output = conv(input)
>>> output.shape
torch.Size([1, 1, 5, 5])
池化層
最大池化層
nn.MaxPool2d(kernel_size, stride=None, padding=0,
dilation=1, return_indices=False, ceil_mode=False)
return_indices
– ifTrue
, will return the max indices along with the outputs.ceil_mode
– whenTrue
, will use ceil instead of floor to compute the output shapestride
– 注意:stride
默認值爲kernel_size
,而非1
>>> max_pooling = nn.MaxPool2d(2, stride=2)
>>> input = torch.randn(1, 1, 4, 4)
>>> max_pooling(input)
tensor([[[[0.9636, 0.7075],
[1.0641, 1.1749]]]])
>>> max_pooling(input).shape
torch.Size([1, 1, 2, 2])
平均池化層
nn.AvgPool2d(kernel_size, stride=None, padding=0,
ceil_mode=False, count_include_pad=True, divisor_override=None)
If padding
is non-zero, then the input is implicitly zero-padded on both sides for padding
number of points.
ceil_mode
– whenTrue
, will use ceil instead of floor to compute the output shapecount_include_pad
– whenTrue
, will include the zero-padding in the averaging calculationdivisor_override
– if specified, it will be used as divisor, otherwiseattr:kernel_size
will be used
The parameters kernel_size
, stride
, padding
can either be:
- a single
int
– in which case the same value is used for the height and width dimension - a
tuple
of two ints – in which case, the first int is used for the height dimension, and the second int for the width dimension
全局平均池化層
nn.Sequential(
nn.AdaptiveMaxPool2d((1,1)),
nn.Flatten()
}
激活函數層
當然,下面的層也可以用torch.nn.functional
中的函數替代
Sigmoid層
nn.Sigmoid()
>>> sigmoid = nn.Sigmoid()
>>> sigmoid(torch.Tensor([1, 1, 2, 2]))
tensor([0.7311, 0.7311, 0.8808, 0.8808])
ReLU層
nn.ReLU(inplace=False)
>>> relu = nn.ReLU(inplace=True)
>>> input = torch.randn(2, 2)
>>> input
tensor([[-0.4853, 2.3864],
[ 0.7122, -0.6493]])
>>> relu(input)
tensor([[0.0000, 2.3864],
[0.7122, 0.0000]])
>>> input
tensor([[0.0000, 2.3864],
[0.7122, 0.0000]])
Softmax層
nn.Softmax(dim=None)
>>> softmax = nn.Softmax(dim=1)
>>> score = torch.randn(1, 4)
>>> score
tensor([[ 0.3101, 3.5648, 1.0988, -1.5856]])
>>> softmax(score)
tensor([[0.0342, 0.8855, 0.0752, 0.0051]])
LogSoftmax層
nn.LogSoftmax(dim=None)
後接nn.NLLLoss
層相當於CrossEntropyLoss
層
Dropout層
nn.Dropout(p=0.5, inplace=False)
>>> dropout = nn.Dropout(0.5, inplace=False)
>>> input = torch.randn(1, 20)
>>> output = dropout(input)
>>> output
tensor([[-2.9413, 0.0000, 1.8461, 1.9605, 0.2774, -0.0000, -2.5381, -2.0313,
-0.1914, 0.0000, 0.5346, -0.0000, 0.0000, 4.4960, -3.8345, -1.0938,
4.3297, 2.1258, -4.1431, 0.0000]])
>>> input
tensor([[-1.4707, 0.5105, 0.9231, 0.9802, 0.1387, -0.4195, -1.2690, -1.0156,
-0.0957, 0.8108, 0.2673, -2.0898, 0.6666, 2.2480, -1.9173, -0.5469,
2.1648, 1.0629, -2.0716, 0.9974]])
BN層
torch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1,
affine=True, track_running_stats=True)
num_features
– from an expected input of sizeeps
– a value added to the denominator for numerical stability. Default:1e-5
momentum
– the value used for the running_mean and running_var computation. Can be set to None for cumulative moving average (i.e. simple average). Default:0.1
affine
– a boolean value that when set toTrue
, this module has learnable affine parameters. Default:True
track_running_stats
– a boolean value that when set toTrue
, this module tracks the running mean and variance, and when set toFalse
, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default:True
Because the Batch Normalization is done over the C dimension, computing statistics on slices, it’s common terminology to call this Spatial Batch Normalization.
The mean and standard-deviation are calculated per-dimension over the mini-batches and and are learnable parameter vectors of size C (where C is the input size). By default, the elements of are set to 1 and the elements of are set to 0.
>>> bn = nn.BatchNorm2d(64)
>>> input = torch.randn(4, 64, 28, 28)
>>> output = bn(input)
>>> output.shape
torch.Size([4, 64, 28, 28])
損失函數層
NLLLoss
nn.NLLLoss(weight=None, size_average=None,
ignore_index=-100, reduce=None, reduction='mean')
The input given through a forward call is expected to contain log-probabilities of each class. input has to be a Tensor of size eitheror with for the K-dimensional case(described later).
It is useful to train a classification problem with C
(C
= number of classes) classes.
The target that this loss expects should be a class index in the range where C
= number of classes; if ignore_index
is specified, this loss also accepts this class index (this index may not necessarily be in the class range).
The unreduced (i.e. with reduction
set to 'none'
) loss can be described as:
If reduction
is ‘mean
’ (default ‘mean
’), then
If reduction
is ‘sum
’ (default ‘mean
’), then
Can also be used for higher dimension inputs, such as 2D images, by providing an input of size with , where K is the number of dimensions, and a target of appropriate shape (see below). In the case of images, it computes NLL loss per-pixel.
weight
(Tensor
, optional) – a manual rescaling weight given to each class. If given, it has to be a Tensor of sizeC
. Otherwise, it is treated as if having all ones. If provided, the optional argumentweight
should be a 1D Tensor assigning weight to each of the classes. This is particularly useful when you have an unbalanced training set. 就是在計算loss時給每個類別加的權重size_average
(bool
, optional) – Deprecatedignore_index
(int
, optional) – Specifies a target value that is ignored and does not contribute to the input gradient.reduce
(bool
, optional) – Deprecatedreduction
(string
, optional) – Specifies the reduction to apply to the output: ’none'
| ’mean'
| 'sum'
. 'none'
: no reduction will be applied, 'mean'
: the sum of the output will be divided by the number of elements in the output, 'sum'
: the output will be summed. Default: ‘mean
’
Shape:
Input
: where = number of classes, or with in the case of K-dimensional loss.Target
: where each value is , or with in the case of K-dimensional loss.Output
:scalar
. Ifreduction
is ‘none
’, then the same size as the target: , or with in the case of K-dimensional loss.
m = nn.LogSoftmax(dim=1)
loss = nn.NLLLoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.tensor([1, 0, 4])
output = loss(m(input), target)
N, C = 5, 4
loss = nn.NLLLoss()
# input is of size N x C x height x width
data = torch.randn(N, C, 8, 8)
m = nn.LogSoftmax(dim=1)
# each element in target has to have 0 <= value < C
target = torch.empty(N, 8, 8, dtype=torch.long).random_(0, C)
output = loss(m(data), target)
CrossEntropyLoss
nn.CrossEntropyLoss(weight=None, size_average=None,
ignore_index=-100, reduce=None, reduction='mean')
This criterion combines nn.LogSoftmax()
and nn.NLLLoss()
in one single class.
其實就是Softmax + CrossEntropyLoss,雖然現在還沒看過源碼,但應該也是因爲它們兩個結合在一起在梯度反向傳播的時候結果就會是漂亮的
參數的意義跟上面的nn.NLLLoss
一樣,這裏就不多說了
loss = nn.CrossEntropyLoss()
input = torch.randn(3, 5, requires_grad=True)
target = torch.empty(3, dtype=torch.long).random_(5)
output = loss(input, target)
output.backward()
優化器
SGD(包含了Momentum以及Nesterov Momentum)
optim.SGD(params, lr=<required parameter>, momentum=0,
dampening=0, weight_decay=0, nesterov=False)
dampening
(float
, optional) – dampening for momentum (default:0
)
疑問:這個dampening
是幹啥的 看源碼時再解答
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
# 每次優化之前都要先清空梯度
optimizer.zero_grad()
loss.backward()
optimizer.step()
Adagrad
optim.Adagrad(params, lr=0.01, lr_decay=0, weight_decay=0,
initial_accumulator_value=0, eps=1e-10)
lr
(float
, optional) – learning rate (default:1e-2
)lr_decay
(float
, optional) – learning rate decay (default:0
)
RMSProp
optim.RMSprop(params, lr=0.01, alpha=0.99, eps=1e-08,
weight_decay=0, momentum=0, centered=False)
alpha
(float
, optional) – smoothing constant (default: 0.99)momentum
(float
, optional) – momentum factor (default: 0)centered
(bool
, optional) – ifTrue
, compute the centered RMSProp, the gradient is normalized by an estimation of its variance
這個alpha
應該就是RMSProp中遺忘過去梯度的動量參數,那麼這個momentum
又是什麼?同樣也只能等看了源碼再解答
Adadelta
optim.Adadelta(params, lr=1.0, rho=0.9, eps=1e-06, weight_decay=0)
lr
(float
, optional) – coefficient that scale delta before it is applied to the parameters (default:1.0
) 按照Adadelta原公式的話應該是不用lr
的,這裏卻有lr
參數,還是需要閱讀源碼後再解答rho
(float
, optional) – coefficient used for computing a running average of squared gradients (default:0.9
)
Adam
optim.Adam(params, lr=0.001, betas=(0.9, 0.999),
eps=1e-08, weight_decay=0, amsgrad=False)
amsgrad
(boolean
, optional) – whether to use the AMSGrad variant of this algorithm from the paper On the Convergence of Adam and Beyond (default:False
)