FLOPs衡量神將網絡中前向傳播的運算次數,FLOPs越小,則計算速度越快。
參數數量衡量神經網絡中所包含參數的數量,參數數量越小,模型體積越小,更容易部署,二者計算方法如下所示:
參數數量更容易計算,只需要衡量,神經網絡中有多少個待定參數即可,以卷積神經網絡爲例,假設卷積核大小爲k × k,輸入通道爲 i,輸出通道爲o,輸出圖像尺寸爲t ×t,則:
每進行一次卷積操作,或一個卷積層,需要的參數數量爲:
k×k×i×o + o 後面的加是加上bias,可看下圖conv1爲例:
注意 conv1的卷積核大小爲11×11,可自行計算出相同結果
在考慮FLOPs時,同樣以卷積計算爲例,可參考下述代碼計算過程:
input_shape = (3,300,300) # Format:(channels, rows,cols)
conv_filter = (64,3,3,3) # Format: (num_filters, channels, rows, cols)
stride = 1
padding = 1
activation = 'relu'
n = conv_filter[1] * conv_filter[2] * conv_filter[3] # vector_length
flops_per_instance = n + (n-1) # general defination for number of flops (n: multiplications and n-1: additions)
num_instances_per_filter = (( input_shape[1] - conv_filter[2] + 2*padding) / stride ) + 1 # for rows
num_instances_per_filter *= (( input_shape[1] - conv_filter[2] + 2*padding) / stride ) + 1 # multiplying with cols
flops_per_filter = num_instances_per_filter * flops_per_instance
total_flops_per_layer = flops_per_filter * conv_filter[0] # multiply with number of filters
if activation == 'relu':
# Here one can add number of flops required
# Relu takes 1 comparison and 1 multiplication
# Assuming for Relu: number of flops equal to length of input vector
total_flops_per_layer += conv_filter[0]*input_shape[1]*input_shape[2]
print(total_flops_per_layer)
上式中,與下列計算結果等效:
即,沒進行一次卷積,計算量大小如下:
k×k×i×t×t×o + (k×k×i-1)×t×t×o