[實戰]200類鳥類細粒度分類識別
我又來了!!!!
一、圖像分類
這次進行實戰項目,鳥類細粒度分類識別實戰。再講細粒度分類之前,讓我們先回顧一下圖像分類吧。
圖像分類是計算機視覺的最基礎的一個任務,從最開始的入門級的mnist手寫數字識別、貓狗圖像二分類到後來的imagenet任務。圖像分類模型隨着數據集的增長,一步步提升到了今天的水平。計算機的圖像分類水準已經超過了人類。
在這裏我把圖像分類任務分爲了兩種,一種是單標籤的圖像分類任務,一種是多標籤的圖像分類任務。
多標籤的圖像分類任務,更加符合人們的認知習慣。因爲現實生活中的圖片往往會包含多個類別物體。
而在單標籤的圖像分類任務中又可以分爲三類:一種是跨物種語義級別的圖像分類,即在不同物種的層次上識別不同類別對象,比如我們常見的貓狗分類。
一種是實例級圖像分類即區分不同的個體,最典型的任務那就是人臉識別。 而還剩下最後一種就是細粒度分類,那麼什麼是細粒度分類呢?
二、圖像細粒度分類
而細粒度圖像分類,相比較我們前面所說 的跨物種的圖像分類,級別更低一些。但相比較實例級的圖像分類,級別稍高一些。
概念上的說法 是對同一大類中的子類的分類,
通俗來講,其主要是解決 我們在日常生活中 可能看到一隻狗,確分不清是哪種狗。
如下圖所示,我們知道下圖中哪一隻是阿拉斯加 哪一隻是哈士奇,左邊是哈士奇 右邊是阿拉斯加
這裏可以當做可判別性部分是 阿拉斯加犬的鼻樑是與黑色毛色是相連的,這就是discriminative part 即可判別性模塊。
三、圖像細粒度分類目前的挑戰
然後再講講細粒度分類實驗目前所遇到的挑戰吧。
細粒度分類的挑戰,但如今面臨着如下三大問題:類內差異大
、類間差異小
,以及有限的數據集
。
由於光線,物體的姿勢,視角、遮擋、背景干擾等等問題,
類內差異大,像這裏的黑腳信天翁,由於光照,背景,姿勢的干擾,從肉眼上很難看出屬於同一個子類
類間差異小 不同個體歸屬於不同子類可能是由於一些微小的不同,如鳥的翅膀的顏色 以及鳥喙顏色的不同
以及有限的數據集的問題,數據集的標註 通常需要專業的知識以及耗費大量的標註時間。
由於上述挑戰問題,我們很難根據現有的粗粒度神經網絡模型得到精準的分類結果。
四、圖像細粒度分類的研究現狀
那麼目前的研究現狀是如何呢?
目前細粒度分類主要是通過尋找可判別性的特徵
來進行分類的,研究方法目前主要是可分爲強監督學習
和弱監督學習
。
強監督學習
是指通常使用邊界框和局部標註信息
,來獲取目標的位置、大小,從而提高分類精度。 即給出了圖片標註中物體的某些顯著特徵
,即discriminative
。
而弱監督學習
是指僅利用圖像的類別標註信息
,不使用額外的標註,
目前弱監督學習的主要思路是定位出判別性的部位,取得判別性的特徵做輔助來分類
。
其實這很符合人類辨別細粒度物體的流程,先看全局信息知道大類,然後根據經驗把注意力放在一些關鍵部位來做出判斷,而這些部位就是弱監督網絡所要找的discriminative parts
。
目前的強監督學習方法有part-based r-cnn
基於r-cnn算法完成了局部區域的檢測
,利用約束條件對r-cnn提取到的區域信息進行修正之後提取卷積特徵,並將不同區域的特徵進行拼接,構成最好的特徵表示,然後通過SVM分類器
進行分類訓練。 Posed-normalized Cnn
對每一張圖片進行位置檢測,然後將檢測框內的圖像進行裁剪,從而提取不同層次、不同位置的圖像,再對提取到的圖像塊進行姿態對其送入CNN
,將得到的特徵拼接後利用SVM分類器
進行分類。 Multi-proposal Net
通過Edge Box Crop
方法獲取圖像塊,並引入關鍵點及視覺特徵的輸出層,進一步強化了局部特徵與全部信息直接的位置關聯。
弱監督方法,有圖像過濾,僅藉助於圖像的類別信息過濾圖片中與物體無關的模塊,其中最有代表性的是Two-level算法。two attention level
利用物體級和局部級的信息,通過Search Selective算法
過濾掉無關背景,然後將過濾掉的背景送入CNN網絡進行訓練,得到物體級的分類結果,隨後通過聚類算法將不同位置的特徵繼續區分,並將不同區域的特徵拼接後送入svm分類器
進行訓練。
人在認知物體和事物時,往往需要完成對其特徵的理解及類別名稱的記憶,B-CNN
根據大腦工作時同認知類別和關注顯著特徵的方法,構建了兩個線性網絡,協調完成局部特徵提取和分類的任務。
到這裏,前期的基礎知識差不多就完成了,下面準備進入正題。
五、200類鳥類細粒度圖像分類實戰
1.CUB200-2011數據集
首先還是一如既往先介紹我們的驅動力----數據。
不對,放錯圖了,應該是下面這張。
本次細粒度分類所採取的數據集CUB200-2011,該數據集是由加州理工學院在2010年提出的細粒度數據集,也是目前細粒度分類識別研究的基準圖像數據集,該數據集共有117888張鳥類圖像,包含了200類鳥類子類,其中訓練數據集有5994張圖像,測試集有5794張圖像,每張圖像均提供了圖像類標註信息,圖像中鳥的bounding box,鳥的關鍵part信息,以及鳥的屬性信息。
評判標準就是以準確率了。
好了,準備上模型了!
2.VGG16模型
先用VGG16來投石問路
在此之前準備好我們的微調模型
# fine-tune 模型
def fine_tune_model(model, optimizer, batch_size, epochs, freeze_num):
'''
discription: 對指定預訓練模型進行fine-tune,並保存爲.hdf5格式
MODEL:傳入的模型,VGG16, ResNet50, ...
optimizer: fine-tune all layers 的優化器, first part默認用adadelta
batch_size: 每一批的尺寸,建議32/64/128
epochs: fine-tune all layers的代數
freeze_num: first part凍結卷積層的數量
'''
# datagen = ImageDataGenerator(
# rescale=1.255,
# # shear_range=0.2,
# # zoom_range=0.2,
# # horizontal_flip=True,
# # vertical_flip=True,
# # fill_mode="nearest"
# )
# datagen.fit(X_train)
# first: 僅訓練全連接層(權重隨機初始化的)
# 凍結所有卷積層
for layer in model.layers[:freeze_num]:
layer.trainable = False
model.compile(optimizer=optimizer,
loss="categorical_crossentropy",
metrics=["accuracy"])
# model.fit_generator(datagen.flow(x_train,y_train,batch_size=batch_size),
# steps_per_epoch=len(x_train)/32,
# epochs=3,
# shuffle=True,
# verbose=1,
# datagen.flow(x_valid, y_valid))
model.fit(x_train,
y_train,
batch_size=batch_size,
epochs=3,
shuffle=True,
verbose=1,
validation_data=(x_valid,y_valid)
)
print('Finish step_1')
# second: fine-tune all layers
for layer in model.layers[:]:
layer.trainable = True
rc = ReduceLROnPlateau(monitor="val_acc",
factor=0.2,
patience=4,
verbose=1,
mode='max')
model_name = model.name + ".hdf5"
mc = ModelCheckpoint(model_name,
monitor="val_acc",
save_best_only=True,
verbose=1,
mode='max')
el = EarlyStopping(monitor="val_acc",
min_delta=0,
patience=5,
verbose=1,
restore_best_weights=True)
model.compile(optimizer=optimizer,
loss='categorical_crossentropy',
metrics=["accuracy"])
# history_fit = model.fit_generator(datagen.flow(x_train,y_train,batch_size=32),
# steps_per_epoch=len(x_train)/32,
# epochs=epochs,
# shuffle=True,
# verbose=1,
# callbacks=[mc,rc,el],
# datagen.flow(x_valid, y_valid))
history_fit = model.fit(x_train,
y_train,
batch_size=batch_size,
epochs=epochs,
shuffle=True,
verbose=1,
validation_data=(x_valid,y_valid),
callbacks=[mc,rc,el])
print('Finish fine-tune')
return history_fit
1.VGG16模型
# 定義一個VGG16的模型
def vgg16_model(img_rows,img_cols):
x = Input(shape=(img_rows, img_cols, 3))
x = Lambda(imagenet_utils.preprocess_input)(x)
base_model = VGG16(input_tensor=x,weights="imagenet",include_top=False, pooling='avg')
x = base_model.output
x = Dense(1024,activation="relu",name="fc1")(x)
x = Dropout(0.5)(x)
predictions = Dense(n_classes,activation="softmax",name="predictions")(x)
vgg16_model = Model(inputs=base_model.input,outputs=predictions,name="vgg16")
return vgg16_model
# 創建VGG16模型
img_rows, img_cols = 300, 300
vgg16_model = vgg16_model(img_rows,img_cols)
for i,layer in enumerate(vgg16_model.layers):
print(i,layer.name)
0 input_3
1 lambda_3
2 block1_conv1
3 block1_conv2
4 block1_pool
5 block2_conv1
6 block2_conv2
7 block2_pool
8 block3_conv1
9 block3_conv2
10 block3_conv3
11 block3_pool
12 block4_conv1
13 block4_conv2
14 block4_conv3
15 block4_pool
16 block5_conv1
17 block5_conv2
18 block5_conv3
19 block5_pool
20 global_average_pooling2d_3
21 fc1
22 dropout_3
23 predictions
optimizer = optimizers.Adam(lr=0.0001)
batch_size = 32
epochs = 30
freeze_num = 21
%time vgg16_history = fine_tune_model(vgg16_model,optimizer,batch_size,epochs,freeze_num)
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.
Train on 7013 samples, validate on 3006 samples
Epoch 1/3
7013/7013 [==============================] - 53s 8ms/step - loss: 7.1095 - acc: 0.0211 - val_loss: 4.4823 - val_acc: 0.0915
Epoch 2/3
7013/7013 [==============================] - 46s 7ms/step - loss: 4.4798 - acc: 0.0914 - val_loss: 3.6892 - val_acc: 0.2239
Epoch 3/3
7013/7013 [==============================] - 46s 7ms/step - loss: 3.6436 - acc: 0.1925 - val_loss: 3.0040 - val_acc: 0.3440
Finish step_1
Train on 7013 samples, validate on 3006 samples
Epoch 1/30
7013/7013 [==============================] - 47s 7ms/step - loss: 2.9171 - acc: 0.3087 - val_loss: 2.1821 - val_acc: 0.4667
Epoch 00001: val_loss improved from inf to 2.18212, saving model to vgg16.hdf5
Epoch 2/30
7013/7013 [==============================] - 46s 7ms/step - loss: 1.9944 - acc: 0.4840 - val_loss: 1.8748 - val_acc: 0.5226
Epoch 00002: val_loss improved from 2.18212 to 1.87480, saving model to vgg16.hdf5
Epoch 3/30
7013/7013 [==============================] - 46s 7ms/step - loss: 1.6493 - acc: 0.5551 - val_loss: 1.7540 - val_acc: 0.5492
Epoch 00003: val_loss improved from 1.87480 to 1.75400, saving model to vgg16.hdf5
Epoch 4/30
7013/7013 [==============================] - 46s 7ms/step - loss: 1.4144 - acc: 0.6144 - val_loss: 1.6711 - val_acc: 0.5655
Epoch 00004: val_loss improved from 1.75400 to 1.67106, saving model to vgg16.hdf5
Epoch 5/30
7013/7013 [==============================] - 46s 7ms/step - loss: 1.2055 - acc: 0.6628 - val_loss: 1.6020 - val_acc: 0.5749
Epoch 00005: val_loss improved from 1.67106 to 1.60200, saving model to vgg16.hdf5
Epoch 00026: val_loss improved from 1.32242 to 1.32005, saving model to vgg16.hdf5
Epoch 27/30
7013/7013 [==============================] - 46s 7ms/step - loss: 0.1979 - acc: 0.9511 - val_loss: 1.3209 - val_acc: 0.6517
Epoch 00027: val_loss did not improve from 1.32005
Epoch 28/30
7013/7013 [==============================] - 46s 7ms/step - loss: 0.1996 - acc: 0.9528 - val_loss: 1.3206 - val_acc: 0.6514
Epoch 00028: val_loss did not improve from 1.32005
Epoch 29/30
7013/7013 [==============================] - 46s 7ms/step - loss: 0.1956 - acc: 0.9555 - val_loss: 1.3216 - val_acc: 0.6517
Epoch 00029: val_loss did not improve from 1.32005
Epoch 00029: ReduceLROnPlateau reducing learning rate to 3.999999898951501e-06.
Epoch 30/30
7013/7013 [==============================] - 46s 7ms/step - loss: 0.1884 - acc: 0.9558 - val_loss: 1.3194 - val_acc: 0.6514
Epoch 00030: val_loss improved from 1.32005 to 1.31935, saving model to vgg16.hdf5
Finish fine-tune
CPU times: user 10min, sys: 3min 58s, total: 13min 58s
Wall time: 25min 37s
history_plot(vgg16_history)
進過上面的一系列操作,我們可以看到VGG16的分類效果,並不是很好呀,只能剛剛及格。
那麼下面有請我們的二號選手EfficientNet
2.EfficientNetB4
咚咚咚,它來了,它來了,它踩着七彩祥雲來了!!!
好了,不多說了,直接上代碼來搭建EfficientNet網絡架構。
# 定義一個EfficientNet模型
def efficient_model(img_rows,img_cols):
K.clear_session()
x = Input(shape=(img_rows,img_cols,3))
x = Lambda(imagenet_utils.preprocess_input)(x)
base_model = EfficientNetB4(input_tensor=x,weights="imagenet",include_top=False,pooling="avg")
x = base_model.output
x = Dense(1024,activation="relu",name="fc1")(x)
x = Dropout(0.5)(x)
predictions = Dense(n_classes,activation="softmax",name="predictions")(x)
eB_model = Model(inputs=base_model.input,outputs=predictions,name="eB4")
return eB_model
# 創建Efficient模型
img_rows,img_cols=224,224
eB_model = efficient_model(img_rows,img_cols)
optimizer = optimizers.Adam(lr=0.0001)
batch_size = 32
epochs = 30
freeze_num = 469
eB_model_history = fine_tune_model(eB_model,optimizer,batch_size,epochs,freeze_num)
Train on 8251 samples, validate on 1768 samples
Epoch 1/3
8251/8251 [==============================] - 49s 6ms/step - loss: 9.3405 - acc: 0.0053 - val_loss: 5.5664 - val_acc: 0.0051
Epoch 2/3
8251/8251 [==============================] - 38s 5ms/step - loss: 6.8968 - acc: 0.0052 - val_loss: 5.3289 - val_acc: 0.0040
Epoch 3/3
8251/8251 [==============================] - 39s 5ms/step - loss: 5.8723 - acc: 0.0061 - val_loss: 5.3021 - val_acc: 0.0040
Finish step_1
Train on 8251 samples, validate on 1768 samples
Epoch 1/30
8251/8251 [==============================] - 261s 32ms/step - loss: 4.4794 - acc: 0.0980 - val_loss: 2.7448 - val_acc: 0.3399
Epoch 00001: val_loss improved from inf to 2.74482, saving model to eB4.hdf5
Epoch 2/30
8251/8251 [==============================] - 155s 19ms/step - loss: 2.2635 - acc: 0.4157 - val_loss: 1.4371 - val_acc: 0.5973
Epoch 00002: val_loss improved from 2.74482 to 1.43707, saving model to eB4.hdf5
Epoch 3/30
8251/8251 [==============================] - 155s 19ms/step - loss: 1.3465 - acc: 0.6244 - val_loss: 1.1637 - val_acc: 0.6719
Epoch 00003: val_loss improved from 1.43707 to 1.16373, saving model to eB4.hdf5
Epoch 4/30
8251/8251 [==============================] - 154s 19ms/step - loss: 0.8824 - acc: 0.7488 - val_loss: 0.9904 - val_acc: 0.7110
Epoch 00016: val_loss did not improve from 0.89365
Epoch 17/30
8251/8251 [==============================] - 154s 19ms/step - loss: 0.0718 - acc: 0.9867 - val_loss: 0.8993 - val_acc: 0.7749
Epoch 00017: val_loss did not improve from 0.89365
Restoring model weights from the end of the best epoch
Epoch 00017: early stopping
Finish fine-tune
history_plot(eB_model_history)
效果很不錯呀,EfficientNet不愧是谷歌出品的,必是精品。那麼既然EfficientNet的效果已經這麼好了,你是不是就不想接着看了,你是不是已經迫不及待想嘗試EfficientNet的效果了呢。
不要急,下面還有幾個小嚐試,首先是在EfficientNet中加入Attention機制,至於Attention機制的話,可以去看我的博客裏面有寫到,那是在我未解放天性之前,寫的可正經了。當然這裏更是正經!!!
3.efficientnet-with-attention
# 定義一個加入Attention模塊的Efficient網絡架構即efficientnet-with-attention
def efficient_attention_model(img_rows,img_cols):
K.clear_session()
in_lay = Input(shape=(img_rows,img_cols,3))
base_model = EfficientNetB3(input_shape=(img_rows,img_cols,3),weights="imagenet",include_top=False)
pt_depth = base_model.get_output_shape_at(0)[-1]
pt_features = base_model(in_lay)
bn_features = BatchNormalization()(pt_features)
# here we do an attention mechanism to turn pixels in the GAP on an off
atten_layer = Conv2D(64,kernel_size=(1,1),padding="same",activation="relu")(Dropout(0.5)(bn_features))
atten_layer = Conv2D(16,kernel_size=(1,1),padding="same",activation="relu")(atten_layer)
atten_layer = Conv2D(8,kernel_size=(1,1),padding="same",activation="relu")(atten_layer)
atten_layer = Conv2D(1,kernel_size=(1,1),padding="valid",activation="sigmoid")(atten_layer)# H,W,1
# fan it out to all of the channels
up_c2_w = np.ones((1,1,1,pt_depth)) #1,1,C
up_c2 = Conv2D(pt_depth,kernel_size=(1,1),padding="same",activation="linear",use_bias=False,weights=[up_c2_w])
up_c2.trainable = False
atten_layer = up_c2(atten_layer)# H,W,C
mask_features = multiply([atten_layer,bn_features])# H,W,C
gap_features = GlobalAveragePooling2D()(mask_features)# 1,1,C
# gap_mask = GlobalAveragePooling2D()(atten_layer)# 1,1,C
# # to account for missing values from the attention model
# gap = Lambda(lambda x:x[0]/x[1],name="RescaleGAP")([gap_features,gap_mask])
gap_dr = Dropout(0.25)(gap_features)
dr_steps = Dropout(0.25)(Dense(1000,activation="relu")(gap_dr))
out_layer = Dense(200,activation="softmax")(dr_steps)
eb_atten_model = Model(inputs=[in_lay],outputs=[out_layer])
return eb_atten_model
img_rows,img_cols = 224,224
eB_atten_model = efficient_attention_model(img_rows,img_cols)
eB_atten_model.save("eb_atten_model.h5")
for i,layer in enumerate(eB_atten_model.layers):
print(i,layer.name)
0 input_1
1 efficientnet-b3
2 batch_normalization_1
3 dropout_1
4 conv2d_1
5 conv2d_2
6 conv2d_3
7 conv2d_4
8 conv2d_5
9 multiply_1
10 global_average_pooling2d_1
11 dropout_2
12 dense_1
13 dropout_3
14 dense_2
optimizer = optimizers.Adam(lr=0.0001)
batch_size = 32
epochs = 30
freeze_num = 12
eB_atten_model_history = fine_tune_model(eB_atten_model,optimizer,batch_size,epochs,freeze_num)
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:793: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.
Train on 8251 samples, validate on 1768 samples
Epoch 1/3
8251/8251 [==============================] - 39s 5ms/step - loss: 5.2083 - acc: 0.0221 - val_loss: 16.0324 - val_acc: 0.0040
Epoch 2/3
8251/8251 [==============================] - 28s 3ms/step - loss: 4.7719 - acc: 0.1130 - val_loss: 16.0147 - val_acc: 0.0057
Epoch 3/3
8251/8251 [==============================] - 28s 3ms/step - loss: 4.3135 - acc: 0.2112 - val_loss: 16.0056 - val_acc: 0.0062
Finish step_1
Train on 8251 samples, validate on 1768 samples
Epoch 1/30
8251/8251 [==============================] - 168s 20ms/step - loss: 2.1612 - acc: 0.4549 - val_loss: 1.1888 - val_acc: 0.6725
Epoch 00001: val_loss improved from inf to 1.18880, saving model to model_1.hdf5
Epoch 2/30
8251/8251 [==============================] - 121s 15ms/step - loss: 0.9003 - acc: 0.7442 - val_loss: 0.9400 - val_acc: 0.7330
Epoch 00002: val_loss improved from 1.18880 to 0.94002, saving model to model_1.hdf5
Epoch 3/30
8251/8251 [==============================] - 121s 15ms/step - loss: 0.5455 - acc: 0.8467 - val_loss: 0.8569 - val_acc: 0.7574
Epoch 00013: val_loss did not improve from 0.78748
Epoch 14/30
8251/8251 [==============================] - 121s 15ms/step - loss: 0.0417 - acc: 0.9924 - val_loss: 0.7958 - val_acc: 0.7924
Epoch 00014: val_loss did not improve from 0.78748
Epoch 00014: ReduceLROnPlateau reducing learning rate to 3.999999898951501e-06.
Epoch 15/30
8251/8251 [==============================] - 121s 15ms/step - loss: 0.0370 - acc: 0.9936 - val_loss: 0.7938 - val_acc: 0.7941
Epoch 00015: val_loss did not improve from 0.78748
Epoch 16/30
8251/8251 [==============================] - 121s 15ms/step - loss: 0.0379 - acc: 0.9933 - val_loss: 0.7932 - val_acc: 0.7952
Epoch 00016: val_loss did not improve from 0.78748
Restoring model weights from the end of the best epoch
Epoch 00016: early stopping
Finish fine-tune
history_plot(eB_atten_model_history)
[外鏈圖片轉存失敗,源站可能有防盜鏈機制,建議將圖片保存下來直接上傳(img-
效果還是提升了一點點的,下面是又嘗試了另外一種attention的寫法,用到senet和cbam。當然如果你不瞭解的話,還是老規矩,去看我的博客,卷積神經網絡發展史裏面有提到。
4.EfficientNetB3 with attention v2
from keras.layers import GlobalAveragePooling2D, GlobalMaxPooling2D, Reshape, Dense, multiply, Permute, Concatenate, Conv2D, Add, Activation, Lambda
from keras import backend as K
from keras.activations import sigmoid
def attach_attention_module(net, attention_module):
if attention_module == 'se_block': # SE_block
net = se_block(net)
elif attention_module == 'cbam_block': # CBAM_block
net = cbam_block(net)
else:
raise Exception("'{}' is not supported attention module!".format(attention_module))
return net
def se_block(input_feature, ratio=8):
"""Contains the implementation of Squeeze-and-Excitation(SE) block.
As described in https://arxiv.org/abs/1709.01507.
"""
channel_axis = 1 if K.image_data_format() == "channels_first" else -1
channel = input_feature._keras_shape[channel_axis]
se_feature = GlobalAveragePooling2D()(input_feature)
se_feature = Reshape((1, 1, channel))(se_feature)
assert se_feature._keras_shape[1:] == (1,1,channel)
se_feature = Dense(channel // ratio,
activation='relu',
kernel_initializer='he_normal',
use_bias=True,
bias_initializer='zeros')(se_feature)
assert se_feature._keras_shape[1:] == (1,1,channel//ratio)
se_feature = Dense(channel,
activation='sigmoid',
kernel_initializer='he_normal',
use_bias=True,
bias_initializer='zeros')(se_feature)
assert se_feature._keras_shape[1:] == (1,1,channel)
if K.image_data_format() == 'channels_first':
se_feature = Permute((3, 1, 2))(se_feature)
se_feature = multiply([input_feature, se_feature])
return se_feature
def cbam_block(cbam_feature, ratio=8):
"""Contains the implementation of Convolutional Block Attention Module(CBAM) block.
As described in https://arxiv.org/abs/1807.06521.
"""
cbam_feature = channel_attention(cbam_feature, ratio)
cbam_feature = spatial_attention(cbam_feature)
return cbam_feature
def channel_attention(input_feature, ratio=8):
channel_axis = 1 if K.image_data_format() == "channels_first" else -1
channel = input_feature._keras_shape[channel_axis]
shared_layer_one = Dense(channel//ratio,
activation='relu',
kernel_initializer='he_normal',
use_bias=True,
bias_initializer='zeros')
shared_layer_two = Dense(channel,
kernel_initializer='he_normal',
use_bias=True,
bias_initializer='zeros')
avg_pool = GlobalAveragePooling2D()(input_feature)
avg_pool = Reshape((1,1,channel))(avg_pool)
assert avg_pool._keras_shape[1:] == (1,1,channel)
avg_pool = shared_layer_one(avg_pool)
assert avg_pool._keras_shape[1:] == (1,1,channel//ratio)
avg_pool = shared_layer_two(avg_pool)
assert avg_pool._keras_shape[1:] == (1,1,channel)
max_pool = GlobalMaxPooling2D()(input_feature)
max_pool = Reshape((1,1,channel))(max_pool)
assert max_pool._keras_shape[1:] == (1,1,channel)
max_pool = shared_layer_one(max_pool)
assert max_pool._keras_shape[1:] == (1,1,channel//ratio)
max_pool = shared_layer_two(max_pool)
assert max_pool._keras_shape[1:] == (1,1,channel)
cbam_feature = Add()([avg_pool,max_pool])
cbam_feature = Activation('sigmoid')(cbam_feature)
if K.image_data_format() == "channels_first":
cbam_feature = Permute((3, 1, 2))(cbam_feature)
return multiply([input_feature, cbam_feature])
def spatial_attention(input_feature):
kernel_size = 7
if K.image_data_format() == "channels_first":
channel = input_feature._keras_shape[1]
cbam_feature = Permute((2,3,1))(input_feature)
else:
channel = input_feature._keras_shape[-1]
cbam_feature = input_feature
avg_pool = Lambda(lambda x: K.mean(x, axis=3, keepdims=True))(cbam_feature)
assert avg_pool._keras_shape[-1] == 1
max_pool = Lambda(lambda x: K.max(x, axis=3, keepdims=True))(cbam_feature)
assert max_pool._keras_shape[-1] == 1
concat = Concatenate(axis=3)([avg_pool, max_pool])
assert concat._keras_shape[-1] == 2
cbam_feature = Conv2D(filters = 1,
kernel_size=kernel_size,
strides=1,
padding='same',
activation='sigmoid',
kernel_initializer='he_normal',
use_bias=False)(concat)
assert cbam_feature._keras_shape[-1] == 1
if K.image_data_format() == "channels_first":
cbam_feature = Permute((3, 1, 2))(cbam_feature)
return multiply([input_feature, cbam_feature])
# 定義一個EfficientNet模型
def efficient__atten2_model(img_rows,img_cols):
K.clear_session()
in_lay = Input(shape=(img_rows,img_cols,3))
base_model = EfficientNetB3(input_shape=(img_rows,img_cols,3),weights="imagenet",include_top=False)
pt_features = base_model(in_lay)
bn_features = BatchNormalization()(pt_features)
atten_features = attach_attention_module(bn_features,"se_block")
gap_features = GlobalAveragePooling2D()(atten_features)
gap_dr = Dropout(0.25)(gap_features)
dr_steps = Dropout(0.25)(Dense(1000,activation="relu")(gap_dr))
out_layer = Dense(n_classes,activation="softmax")(dr_steps)
eb_atten_model = Model(inputs=[in_lay],outputs=[out_layer])
return eb_atten_model
img_rows,img_cols = 224,224
eB_atten2_model = efficient__atten2_model(img_rows,img_cols)
optimizer = optimizers.Adam(lr=0.0001)
batch_size = 32
epochs = 30
freeze_num = 19
eB_atten2_model_history = fine_tune_model(eB_atten2_model,optimizer,batch_size,epochs,freeze_num)
Train on 8251 samples, validate on 1768 samples
Epoch 1/3
8251/8251 [==============================] - 33s 4ms/step - loss: 5.3202 - acc: 0.0061 - val_loss: 16.0269 - val_acc: 0.0057
Epoch 2/3
8251/8251 [==============================] - 26s 3ms/step - loss: 5.3261 - acc: 0.0051 - val_loss: 16.0269 - val_acc: 0.0057
Epoch 3/3
8251/8251 [==============================] - 26s 3ms/step - loss: 5.3248 - acc: 0.0048 - val_loss: 16.0269 - val_acc: 0.0057
Finish step_1
Train on 8251 samples, validate on 1768 samples
Epoch 1/30
8251/8251 [==============================] - 153s 19ms/step - loss: 3.9559 - acc: 0.1742 - val_loss: 2.1066 - val_acc: 0.4712
Epoch 00001: val_loss improved from inf to 2.10657, saving model to model_1.hdf5
Epoch 2/30
8251/8251 [==============================] - 119s 14ms/step - loss: 1.6183 - acc: 0.5708 - val_loss: 1.1768 - val_acc: 0.6618
Epoch 00002: val_loss improved from 2.10657 to 1.17679, saving model to model_1.hdf5
Epoch 3/30
8251/8251 [==============================] - 119s 14ms/step - loss: 0.9172 - acc: 0.7374 - val_loss: 0.9507 - val_acc: 0.7189
Epoch 00003: val_loss improved from 1.17679 to 0.95071, saving model to model_1.hdf5
Epoch 4/30
8251/8251 [==============================] - 119s 14ms/step - loss: 0.5897 - acc: 0.8317 - val_loss: 0.8628 - val_acc: 0.7562
Epoch 00004: val_loss improved from 0.95071 to 0.86283, saving model to model_1.hdf5
Epoch 5/30
8251/8251 [==============================] - 119s 14ms/step - loss: 0.3838 - acc: 0.8956 - val_loss: 0.8359 - val_acc: 0.7636
Epoch 00005: val_loss improved from 0.86283 to 0.83592, saving model to model_1.hdf5
Epoch 6/30
8251/8251 [==============================] - 119s 14ms/step - loss: 0.2797 - acc: 0.9234 - val_loss: 0.8280 - val_acc: 0.7647
Epoch 00006: val_loss improved from 0.83592 to 0.82797, saving model to model_1.hdf5
Epoch 7/30
8251/8251 [==============================] - 119s 14ms/step - loss: 0.1997 - acc: 0.9495 - val_loss: 0.8620 - val_acc: 0.7602
Epoch 00007: val_loss did not improve from 0.82797
Epoch 8/30
8251/8251 [==============================] - 119s 14ms/step - loss: 0.1408 - acc: 0.9667 - val_loss: 0.8602 - val_acc: 0.7800
Epoch 00008: val_loss did not improve from 0.82797
Epoch 9/30
8251/8251 [==============================] - 119s 14ms/step - loss: 0.1103 - acc: 0.9739 - val_loss: 0.9202 - val_acc: 0.7545
Epoch 00009: val_loss did not improve from 0.82797
Epoch 00009: ReduceLROnPlateau reducing learning rate to 1.9999999494757503e-05.
Epoch 10/30
8251/8251 [==============================] - 119s 14ms/step - loss: 0.0803 - acc: 0.9824 - val_loss: 0.8677 - val_acc: 0.7709
Epoch 00010: val_loss did not improve from 0.82797
Epoch 11/30
8251/8251 [==============================] - 119s 14ms/step - loss: 0.0772 - acc: 0.9833 - val_loss: 0.8560 - val_acc: 0.7771
Epoch 00011: val_loss did not improve from 0.82797
Restoring model weights from the end of the best epoch
Epoch 00011: early stopping
Finish fine-tune
history_plot(eB_atten2_model_history)
咱也不知道爲什麼,這個效果比上面那個attention的寫法,會提升一點點,這就是煉丹吧。
5.雙線性EfficientNet
下面就是嘗試了一種雙線性的網絡架構,這裏我還畫了圖呢!!!
該模型的整體流程是:
- 將圖片輸入,並對輸入圖片進行數據增強操作;
- 之後主幹網絡用19年google提出的efficientnet架構來提取feature maps;
- 之後再結合注意力模塊,提取attention maps。
- 然後將attention maps與feature maps逐一相乘,最後在加入全連接層進行分類,從而得到最終的分類結果。
這裏也一併給出attention機制的圖吧!
# 定義一個雙線性EfficientNet Attention模型
def blinear_efficient__atten_model(img_rows,img_cols):
K.clear_session()
in_lay = Input(shape=(img_rows,img_cols,3))
base_model = EfficientNetB3(input_shape=(img_rows,img_cols,3),weights="imagenet",include_top=False)
pt_depth = base_model.get_output_shape_at(0)[-1]
cnn_features_a = base_model(in_lay)
cnn_bn_features_a = BatchNormalization()(cnn_features_a)
# attention mechanism
# here we do an attention mechanism to turn pixels in the GAP on an off
atten_layer = Conv2D(64,kernel_size=(1,1),padding="same",activation="relu")(Dropout(0.5)(cnn_bn_features_a))
atten_layer = Conv2D(16,kernel_size=(1,1),padding="same",activation="relu")(atten_layer)
atten_layer = Conv2D(8,kernel_size=(1,1),padding="same",activation="relu")(atten_layer)
atten_layer = Conv2D(1,kernel_size=(1,1),padding="valid",activation="sigmoid")(atten_layer)# H,W,1
# fan it out to all of the channels
up_c2_w = np.ones((1,1,1,pt_depth)) #1,1,C
up_c2 = Conv2D(pt_depth,kernel_size=(1,1),padding="same",activation="linear",use_bias=False,weights=[up_c2_w])
up_c2.trainable = True
atten_layer = up_c2(atten_layer)# H,W,C
cnn_atten_out_a = multiply([atten_layer,cnn_bn_features_a])# H,W,C
cnn_atten_out_b = cnn_atten_out_a
cnn_out_dot = multiply([cnn_atten_out_a,cnn_atten_out_b])
gap_features = GlobalAveragePooling2D()(cnn_out_dot)
gap_dr = Dropout(0.25)(gap_features)
dr_steps = Dropout(0.25)(Dense(1000,activation="relu")(gap_dr))
out_layer = Dense(200,activation="softmax")(dr_steps)
b_eff_atten_model = Model(inputs=[in_lay],outputs=[out_layer],name="blinear_efficient_atten")
return b_eff_atten_model
# 創建雙線性EfficientNet Attention模型
img_rows,img_cols = 256,256
befficient_model = blinear_efficient__atten_model(img_rows,img_cols)
befficient_model.save("befficient_model.h5")
optimizer = optimizers.Adam(lr=0.0001)
batch_size = 32
epochs = 30
freeze_num = 19
befficient_model_history = fine_tune_model(befficient_model,optimizer,batch_size,epochs,freeze_num)
Train on 8251 samples, validate on 1768 samples
Epoch 1/3
8251/8251 [==============================] - 38s 5ms/step - loss: 5.3903 - acc: 0.0052 - val_loss: 14.1897 - val_acc: 0.0040
Epoch 2/3
8251/8251 [==============================] - 33s 4ms/step - loss: 5.3926 - acc: 0.0052 - val_loss: 14.1897 - val_acc: 0.0040
Epoch 3/3
8251/8251 [==============================] - 33s 4ms/step - loss: 5.3948 - acc: 0.0068 - val_loss: 14.1897 - val_acc: 0.0040
Finish step_1
Train on 8251 samples, validate on 1768 samples
Epoch 1/30
8251/8251 [==============================] - 193s 23ms/step - loss: 4.7127 - acc: 0.0749 - val_loss: 2.9079 - val_acc: 0.3060
Epoch 00001: val_acc improved from -inf to 0.30600, saving model to blinear_efficient_atten.hdf5
Epoch 2/30
8251/8251 [==============================] - 148s 18ms/step - loss: 2.1653 - acc: 0.4462 - val_loss: 1.3817 - val_acc: 0.6160
Epoch 00002: val_acc improved from 0.30600 to 0.61595, saving model to blinear_efficient_atten.hdf5
Epoch 3/30
8251/8251 [==============================] - 149s 18ms/step - loss: 1.1834 - acc: 0.6676 - val_loss: 1.0714 - val_acc: 0.7002
Epoch 00003: val_acc improved from 0.61595 to 0.70023, saving model to blinear_efficient_atten.hdf5
Epoch 4/30
8251/8251 [==============================] - 149s 18ms/step - loss: 0.8070 - acc: 0.7666 - val_loss: 0.9743 - val_acc: 0.7342
Epoch 00004: val_acc improved from 0.70023 to 0.73416, saving model to blinear_efficient_atten.hdf5
Epoch 5/30
Epoch 00007: val_acc improved from 0.74830 to 0.75735, saving model to blinear_efficient_atten.hdf5
Epoch 00010: val_acc did not improve from 0.76867
Epoch 11/30
8251/8251 [==============================] - 149s 18ms/step - loss: 0.1421 - acc: 0.9547 - val_loss: 1.1319 - val_acc: 0.7692
Epoch 00011: val_acc improved from 0.76867 to 0.76923, saving model to blinear_efficient_atten.hdf5
Epoch 12/30
8251/8251 [==============================] - 149s 18ms/step - loss: 0.1232 - acc: 0.9622 - val_loss: 1.0809 - val_acc: 0.7704
Epoch 00018: val_acc improved from 0.77489 to 0.78224, saving model to blinear_efficient_atten.hdf5
Epoch 19/30
8251/8251 [==============================] - 149s 18ms/step - loss: 0.0880 - acc: 0.9714 - val_loss: 1.2171 - val_acc: 0.7721
Epoch 00022: ReduceLROnPlateau reducing learning rate to 1.9999999494757503e-05.
Epoch 23/30
8251/8251 [==============================] - 148s 18ms/step - loss: 0.0465 - acc: 0.9859 - val_loss: 1.1591 - val_acc: 0.7930
Epoch 00023: val_acc improved from 0.78224 to 0.79299, saving model to blinear_efficient_atten.hdf5
Epoch 24/30
8251/8251 [==============================] - 148s 18ms/step - loss: 0.0360 - acc: 0.9893 - val_loss: 1.1312 - val_acc: 0.7969
Epoch 00024: val_acc improved from 0.79299 to 0.79695, saving model to blinear_efficient_atten.hdf5
Epoch 25/30
8251/8251 [==============================] - 148s 18ms/step - loss: 0.0275 - acc: 0.9920 - val_loss: 1.1477 - val_acc: 0.8015
Epoch 00028: val_acc did not improve from 0.80147
Epoch 29/30
8251/8251 [==============================] - 148s 18ms/step - loss: 0.0248 - acc: 0.9922 - val_loss: 1.1467 - val_acc: 0.8020
Epoch 00029: val_acc improved from 0.80147 to 0.80204, saving model to blinear_efficient_atten.hdf5
Epoch 30/30
8251/8251 [==============================] - 148s 18ms/step - loss: 0.0232 - acc: 0.9919 - val_loss: 1.1427 - val_acc: 0.8003
Epoch 00030: val_acc did not improve from 0.80204
Finish fine-tune
history_plot(befficient_model_history)
可以從圖中看到,雙線性的結構,準確率還會提升一些。
終於來到故事的結尾處了,最後在嘗試一些雙線性的VGG16。
6.雙線性VGG16模型
# 定義雙線性VGG16模型
from keras import backend as K
def batch_dot(cnn_ab):
return K.batch_dot(cnn_ab[0], cnn_ab[1], axes=[1, 1])
def sign_sqrt(x):
return K.sign(x) * K.sqrt(K.abs(x) + 1e-10)
def l2_norm(x):
return K.l2_normalize(x, axis=-1)
def bilinear_vgg16(img_rows,img_cols):
input_tensor = Input(shape=(img_rows,img_cols,3))
input_tensor = Lambda(imagenet_utils.preprocess_input)(input_tensor)
model_vgg16 = VGG16(include_top=False, weights="imagenet",
input_tensor=input_tensor,pooling="avg")
cnn_out_a = model_vgg16.layers[-2].output
cnn_out_shape = model_vgg16.layers[-2].output_shape
cnn_out_a = Reshape([cnn_out_shape[1]*cnn_out_shape[2],
cnn_out_shape[-1]])(cnn_out_a)
cnn_out_b = cnn_out_a
cnn_out_dot = Lambda(batch_dot)([cnn_out_a, cnn_out_b])
cnn_out_dot = Reshape([cnn_out_shape[-1]*cnn_out_shape[-1]])(cnn_out_dot)
sign_sqrt_out = Lambda(sign_sqrt)(cnn_out_dot)
l2_norm_out = Lambda(l2_norm)(sign_sqrt_out)
fc1 = Dense(1024,activation="relu",name="fc1")(l2_norm_out)
dropout = Dropout(0.5)(fc1)
output = Dense(n_classes, activation="softmax",name="output")(dropout)
bvgg16_model = Model(inputs=model_vgg16.input, outputs=output,name="bvgg16")
return bvgg16_model
# 創建雙線性VGG16模型
img_rows,img_cols = 300,300
bvgg16_model = bilinear_vgg16(img_rows,img_cols)
for i,layer in enumerate(bvgg16_model.layers):
print(i,layer.name)
0 input_1
1 lambda_1
2 block1_conv1
3 block1_conv2
4 block1_pool
5 block2_conv1
6 block2_conv2
7 block2_pool
8 block3_conv1
9 block3_conv2
10 block3_conv3
11 block3_pool
12 block4_conv1
13 block4_conv2
14 block4_conv3
15 block4_pool
16 block5_conv1
17 block5_conv2
18 block5_conv3
19 block5_pool
20 reshape_1
21 lambda_2
22 reshape_2
23 lambda_3
24 lambda_4
25 fc1
26 dropout_1
27 output
optimizer = optimizers.Adam(lr=0.0001)
batch_size = 32
epochs = 100
freeze_num = 25
bvgg16_history = fine_tune_model(bvgg16_model,optimizer,batch_size,epochs,freeze_num)
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:793: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.
Train on 8251 samples, validate on 1768 samples
Epoch 1/3
8251/8251 [==============================] - 80s 10ms/step - loss: 5.1197 - acc: 0.0572 - val_loss: 4.8534 - val_acc: 0.2002
Epoch 2/3
8251/8251 [==============================] - 71s 9ms/step - loss: 4.4758 - acc: 0.1863 - val_loss: 4.1177 - val_acc: 0.3569
Epoch 3/3
8251/8251 [==============================] - 71s 9ms/step - loss: 3.7386 - acc: 0.2743 - val_loss: 3.4439 - val_acc: 0.4378
Finish step_1
Train on 8251 samples, validate on 1768 samples
Epoch 1/100
8251/8251 [==============================] - 76s 9ms/step - loss: 2.9186 - acc: 0.3475 - val_loss: 2.5064 - val_acc: 0.5334
Epoch 00001: val_loss improved from inf to 2.50638, saving model to bvgg16.hdf5
Epoch 2/100
8251/8251 [==============================] - 70s 9ms/step - loss: 2.3073 - acc: 0.4696 - val_loss: 2.1717 - val_acc: 0.5888
Epoch 00002: val_loss improved from 2.50638 to 2.17170, saving model to bvgg16.hdf5
Epoch 3/100
8251/8251 [==============================] - 70s 9ms/step - loss: 2.0086 - acc: 0.5355 - val_loss: 1.9604 - val_acc: 0.6222
Epoch 00067: val_loss did not improve from 0.89483
Epoch 68/100
8251/8251 [==============================] - 71s 9ms/step - loss: 0.0539 - acc: 0.9971 - val_loss: 0.8984 - val_acc: 0.7590
Epoch 00068: val_loss did not improve from 0.89483
Epoch 00068: ReduceLROnPlateau reducing learning rate to 3.999999898951501e-06.
Epoch 69/100
8251/8251 [==============================] - 71s 9ms/step - loss: 0.0536 - acc: 0.9972 - val_loss: 0.8972 - val_acc: 0.7602
Epoch 00069: val_loss did not improve from 0.89483
Epoch 70/100
8251/8251 [==============================] - 71s 9ms/step - loss: 0.0517 - acc: 0.9973 - val_loss: 0.8968 - val_acc: 0.7630
Epoch 00070: val_loss did not improve from 0.89483
Restoring model weights from the end of the best epoch
Epoch 00070: early stopping
Finish fine-tune
history_plot(bvgg16_history)
終於,完成了。至於效果的話,大家就看圖感受吧。效果肯定是不如EfficientNet了。
當然大家可以調調圖片的分辨率,學習率,batch_size等等,好好練丹吧!