目錄
關於Nesterov Accelerated Gradient
多層感知機(MLP)的softmax多分類
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation
from keras.optimizers import SGD
import keras
import numpy as np
# 生成虛擬數據
x_train = np.random.random((1000, 20))
y_train = keras.utils.to_categorical(np.random.randint(10, size=(1000, 1)), num_classes=10)
x_test = np.random.random((100, 20))
y_test = keras.utils.to_categorical(np.random.randint(10, size=(100, 1)), num_classes=10)
model = Sequential()
model.add(Dense(64, activation='relu', input_dim=20))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy'])
model.fit(x_train, y_train, epochs=20, batch_size=128)
score = model.evaluate(x_test, y_test, batch_size=128)
print("score: ", score)
運行結果:
Using TensorFlow backend.
Epoch 1/20
2019-02-28 10:37:16.147684: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
128/1000 [==>...........................] - ETA: 0s - loss: 2.4224 - acc: 0.0859
1000/1000 [==============================] - 0s 149us/step - loss: 2.3773 - acc: 0.1000
Epoch 2/20
128/1000 [==>...........................] - ETA: 0s - loss: 2.3494 - acc: 0.0703
1000/1000 [==============================] - 0s 11us/step - loss: 2.3454 - acc: 0.0870
Epoch 3/20
128/1000 [==>...........................] - ETA: 0s - loss: 2.3077 - acc: 0.1094
1000/1000 [==============================] - 0s 11us/step - loss: 2.3437 - acc: 0.0870
Epoch 4/20
128/1000 [==>...........................] - ETA: 0s - loss: 2.3307 - acc: 0.0938
1000/1000 [==============================] - 0s 11us/step - loss: 2.3364 - acc: 0.0980
Epoch 5/20
128/1000 [==>...........................] - ETA: 0s - loss: 2.3100 - acc: 0.1094
1000/1000 [==============================] - 0s 10us/step - loss: 2.3171 - acc: 0.1100
Epoch 6/20
128/1000 [==>...........................] - ETA: 0s - loss: 2.3120 - acc: 0.1172
1000/1000 [==============================] - 0s 11us/step - loss: 2.3188 - acc: 0.1040
Epoch 7/20
128/1000 [==>...........................] - ETA: 0s - loss: 2.3345 - acc: 0.0859
1000/1000 [==============================] - 0s 12us/step - loss: 2.3167 - acc: 0.1030
Epoch 8/20
128/1000 [==>...........................] - ETA: 0s - loss: 2.3246 - acc: 0.0625
1000/1000 [==============================] - 0s 11us/step - loss: 2.3150 - acc: 0.0970
Epoch 9/20
128/1000 [==>...........................] - ETA: 0s - loss: 2.3137 - acc: 0.1562
1000/1000 [==============================] - 0s 11us/step - loss: 2.3083 - acc: 0.1070
Epoch 10/20
128/1000 [==>...........................] - ETA: 0s - loss: 2.2797 - acc: 0.0781
1000/1000 [==============================] - 0s 12us/step - loss: 2.3075 - acc: 0.1090
Epoch 11/20
128/1000 [==>...........................] - ETA: 0s - loss: 2.3197 - acc: 0.1016
1000/1000 [==============================] - 0s 11us/step - loss: 2.3028 - acc: 0.1030
Epoch 12/20
128/1000 [==>...........................] - ETA: 0s - loss: 2.2950 - acc: 0.1250
1000/1000 [==============================] - 0s 11us/step - loss: 2.2958 - acc: 0.1240
Epoch 13/20
128/1000 [==>...........................] - ETA: 0s - loss: 2.2962 - acc: 0.1172
1000/1000 [==============================] - 0s 11us/step - loss: 2.3070 - acc: 0.1080
Epoch 14/20
128/1000 [==>...........................] - ETA: 0s - loss: 2.2960 - acc: 0.1016
1000/1000 [==============================] - 0s 12us/step - loss: 2.3027 - acc: 0.1070
Epoch 15/20
128/1000 [==>...........................] - ETA: 0s - loss: 2.2930 - acc: 0.1172
1000/1000 [==============================] - 0s 11us/step - loss: 2.2939 - acc: 0.1260
Epoch 16/20
128/1000 [==>...........................] - ETA: 0s - loss: 2.3049 - acc: 0.1016
1000/1000 [==============================] - 0s 11us/step - loss: 2.3043 - acc: 0.1080
Epoch 17/20
128/1000 [==>...........................] - ETA: 0s - loss: 2.3016 - acc: 0.0703
1000/1000 [==============================] - 0s 11us/step - loss: 2.3060 - acc: 0.1000
Epoch 18/20
128/1000 [==>...........................] - ETA: 0s - loss: 2.2788 - acc: 0.1328
1000/1000 [==============================] - 0s 11us/step - loss: 2.2954 - acc: 0.1190
Epoch 19/20
128/1000 [==>...........................] - ETA: 0s - loss: 2.2863 - acc: 0.1641
1000/1000 [==============================] - 0s 11us/step - loss: 2.2952 - acc: 0.1210
Epoch 20/20
128/1000 [==>...........................] - ETA: 0s - loss: 2.3144 - acc: 0.0625
1000/1000 [==============================] - 0s 10us/step - loss: 2.2917 - acc: 0.1150
100/100 [==============================] - 0s 259us/step
score: [2.301650047302246, 0.05999999865889549]
其中SGD爲隨機梯度下降優化器, Stochastic Gradient Descent。
四個參數:
- lr:學習率
- momentum:動量優化的momentum參數,用於加速SGD在相關方向上前進,並抑制震盪
- decay:每次更新後學習率的衰減值
- nesterov:boolean,是否適用Nesterov動量
關於Momentum
SGD在ravines的情況下容易被困住(ravines就是曲面的一個方向比另一個方向更陡),這時SGD會發生震盪而遲遲不能接近極小值:
Momentum通過加入可以加速SGD,並且抑制震盪:
加入這一項,可以使得梯度方向不變的維度上速度變快,梯度方向有所改變的維度上的更新速度變慢,這樣就可以加快收斂並減少震盪
超參數的設定:一般取左右
關於Nesterov Accelerated Gradient
用來近似作爲下一步會變成的值,則在計算梯度時,不是在當前的位置,而是在未來的位置上:
超參數的設定值:一般取左右
效果比較:
藍色是Momentum的過程,會計算當前的梯度,然後再更新後的累積梯度後會有一個大的跳躍。
NAG (Nesterov Accelerated Gradient)會在前一步累積的梯度上(灰色)有一個大的跳躍,然後衡量一下梯度做一下修正(紅色),這種預期的更新可以避免我們走的太快。
基於多層感知器的二分類
# 生成虛擬數據
x_train = np.random.random((1000, 20))
y_train = np.random.randint(2, size=(1000, 1))
x_test = np.random.random((100, 20))
y_test = np.random.randint(2, size=(100, 1))
model = Sequential()
model.add(Dense(64, activation='relu', input_dim=20))
model.add(Dropout(0.5))
model.add(Dense(64, activation='relu'))
model.add(Dropout(0.5))
model.add((Dense(1, activation='sigmoid')))
model.compile(loss='binary_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
model.fit(x_train, y_train, batch_size=128, epochs=20)
score = model.evaluate(x_test, y_test, batch_size=128)
print("score: ", score)
結果:
Using TensorFlow backend.
Epoch 1/10
2019-02-28 14:24:11.425481: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
128/1000 [==>...........................] - ETA: 1s - loss: 0.7517 - acc: 0.5078
1000/1000 [==============================] - 0s 186us/step - loss: 0.7252 - acc: 0.5090
Epoch 2/10
128/1000 [==>...........................] - ETA: 0s - loss: 0.7346 - acc: 0.4609
1000/1000 [==============================] - 0s 10us/step - loss: 0.7191 - acc: 0.4820
Epoch 3/10
128/1000 [==>...........................] - ETA: 0s - loss: 0.6853 - acc: 0.5078
1000/1000 [==============================] - 0s 10us/step - loss: 0.7109 - acc: 0.4860
Epoch 4/10
128/1000 [==>...........................] - ETA: 0s - loss: 0.6962 - acc: 0.5000
1000/1000 [==============================] - 0s 11us/step - loss: 0.7083 - acc: 0.4890
Epoch 5/10
128/1000 [==>...........................] - ETA: 0s - loss: 0.6950 - acc: 0.5391
1000/1000 [==============================] - 0s 10us/step - loss: 0.7050 - acc: 0.4990
Epoch 6/10
128/1000 [==>...........................] - ETA: 0s - loss: 0.7180 - acc: 0.4609
1000/1000 [==============================] - 0s 10us/step - loss: 0.7037 - acc: 0.5040
Epoch 7/10
128/1000 [==>...........................] - ETA: 0s - loss: 0.7023 - acc: 0.4453
1000/1000 [==============================] - 0s 10us/step - loss: 0.7014 - acc: 0.4850
Epoch 8/10
128/1000 [==>...........................] - ETA: 0s - loss: 0.7039 - acc: 0.5000
1000/1000 [==============================] - 0s 10us/step - loss: 0.6987 - acc: 0.5040
Epoch 9/10
128/1000 [==>...........................] - ETA: 0s - loss: 0.7013 - acc: 0.5078
1000/1000 [==============================] - 0s 10us/step - loss: 0.6934 - acc: 0.5360
Epoch 10/10
128/1000 [==>...........................] - ETA: 0s - loss: 0.7071 - acc: 0.4922
1000/1000 [==============================] - 0s 10us/step - loss: 0.6983 - acc: 0.5210
100/100 [==============================] - 0s 279us/step
score: [0.6979394555091858, 0.4399999976158142]
類似VGG的卷積神經網絡
關於VGG
Karen Simonyan & Andrew Zisserman的VGG網絡結構:
# 生成虛擬數據
x_train = np.random.random((100, 100, 100, 3))
y_train = keras.utils.to_categorical(np.random.randint(10, size=(100, 1)), num_classes=10)
x_test = np.random.random((20, 100, 100, 3))
y_test = keras.utils.to_categorical(np.random.randint(10, size=(20, 1)), num_classes=10)
model = Sequential()
model.add(Conv2D(32, (3, 3), activation='relu', input_shape=(100, 100, 3)))
model.add(Conv2D(32, (3, 3), activation='relu'))
model.add(Dropout(0.25))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation='softmax'))
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd)
model.fit(x_train, y_train, batch_size=32, epochs=10)
score = model.evaluate(x_test, y_test, batch_size=32)
print("score:", score)
結果:
Using TensorFlow backend.
Epoch 1/10
2019-02-28 14:50:59.772493: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX AVX2
32/100 [========>.....................] - ETA: 6s - loss: 2.2790
64/100 [==================>...........] - ETA: 2s - loss: 2.3399
96/100 [===========================>..] - ETA: 0s - loss: 2.3195
100/100 [==============================] - 7s 72ms/step - loss: 2.3264
Epoch 2/10
32/100 [========>.....................] - ETA: 4s - loss: 2.3259
64/100 [==================>...........] - ETA: 2s - loss: 2.2703
96/100 [===========================>..] - ETA: 0s - loss: 2.3011
100/100 [==============================] - 6s 61ms/step - loss: 2.3065
Epoch 3/10
32/100 [========>.....................] - ETA: 4s - loss: 2.2935
64/100 [==================>...........] - ETA: 2s - loss: 2.2876
96/100 [===========================>..] - ETA: 0s - loss: 2.2873
100/100 [==============================] - 6s 62ms/step - loss: 2.2882
Epoch 4/10
32/100 [========>.....................] - ETA: 4s - loss: 2.2825
64/100 [==================>...........] - ETA: 2s - loss: 2.2777
96/100 [===========================>..] - ETA: 0s - loss: 2.2668
100/100 [==============================] - 6s 62ms/step - loss: 2.2689
Epoch 5/10
32/100 [========>.....................] - ETA: 4s - loss: 2.3120
64/100 [==================>...........] - ETA: 2s - loss: 2.2865
96/100 [===========================>..] - ETA: 0s - loss: 2.2830
100/100 [==============================] - 6s 62ms/step - loss: 2.2771
Epoch 6/10
32/100 [========>.....................] - ETA: 4s - loss: 2.3145
64/100 [==================>...........] - ETA: 2s - loss: 2.2907
96/100 [===========================>..] - ETA: 0s - loss: 2.2718
100/100 [==============================] - 6s 62ms/step - loss: 2.2757
Epoch 7/10
32/100 [========>.....................] - ETA: 4s - loss: 2.2969
64/100 [==================>...........] - ETA: 2s - loss: 2.2606
96/100 [===========================>..] - ETA: 0s - loss: 2.2733
100/100 [==============================] - 6s 62ms/step - loss: 2.2728
Epoch 8/10
32/100 [========>.....................] - ETA: 4s - loss: 2.2306
64/100 [==================>...........] - ETA: 2s - loss: 2.2661
96/100 [===========================>..] - ETA: 0s - loss: 2.2564
100/100 [==============================] - 6s 62ms/step - loss: 2.2579
Epoch 9/10
32/100 [========>.....................] - ETA: 4s - loss: 2.2718
64/100 [==================>...........] - ETA: 2s - loss: 2.2901
96/100 [===========================>..] - ETA: 0s - loss: 2.2900
100/100 [==============================] - 6s 62ms/step - loss: 2.2870
Epoch 10/10
32/100 [========>.....................] - ETA: 4s - loss: 2.3367
64/100 [==================>...........] - ETA: 2s - loss: 2.2874
96/100 [===========================>..] - ETA: 0s - loss: 2.2905
100/100 [==============================] - 6s 62ms/step - loss: 2.2886
20/20 [==============================] - 0s 20ms/step
score: 2.2975594997406006
本程序模型結構:
- 輸入:100*100*3的數據
- 第一層:使用了32個3*3的卷積核,則第一層的大小爲:98*98*32
- 第二層:使用了32個3*3的卷積核,則第二層的大小爲:96*96*32
- 第三層:使用了2*2的最大池化,則第三層大小爲:95*95*32
- 第四層:使用了64個3*3的卷積核,則第四層的大小爲:93*93*64
- 第五層:使用了64個3*3的卷積核,則第五層的大小爲:91*91*64
- 第六層:使用了2*2的最大池化, 則第六層的大小爲:90*90*64
- 第七層:全連接層,256個神經元
- 輸出層:全連接層,10個神經元,分別對應數據的10個類別
關於卷積和池化結果的計算公式
卷積結果:
- 卷積核個數
- 卷積核大小
- 步長
- 填充
卷積過後的大小爲:
池化結果:
- 池化大小
- 步長
池化後的大小爲: