Tensorflow基礎(五)--如何防止過擬合,Dropout的使用

1.擬合

迴歸問題擬合可能情況:
迴歸問題
分類問題擬合可能情況:
分類問題
過擬合是能把訓練樣本很好甚至百分之百擬合,但是如果來了一批新樣本,他的準確率又會非常低。正確擬合應該是在訓練樣本和新樣本中都有一致且較好的準確率。

2.防止過擬合的方法

過擬合一般是數據集太小,神經網絡又太複雜導致的。就比如我們解方程的時候,已知情況少,而未知變量過多,這樣的話就求不出應有的解。
爲了防止過擬合一般有如下三種方法解決:
在這裏插入圖片描述

3.演示代碼

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

#載入數據集
mnist = input_data.read_data_sets("MNIST_data",one_hot=True)

#每個批次的大小
batch_size = 100
#計算一共有多少個批次
n_batch = mnist.train.num_examples // batch_size

#定義兩個placeholder
x = tf.placeholder(tf.float32,[None,784])
y = tf.placeholder(tf.float32,[None,10])
keep_prob = tf.placeholder(tf.float32)

#故意創建一個複雜的神經網絡,試試用Dropout進行調整對比
W1 = tf.Variable(tf.truncated_normal([784,2000],stddev=0.1))
b1 = tf.Variable(tf.zeros([2000])+0.1)
L1 = tf.nn.tanh(tf.matmul(x,W1)+b1)
L1_drop = tf.nn.dropout(L1,keep_prob)

W2 = tf.Variable(tf.truncated_normal([2000,2000],stddev=0.1))
b2 = tf.Variable(tf.zeros([2000])+0.1)
L2 = tf.nn.tanh(tf.matmul(L1_drop,W2)+b2)
L2_drop = tf.nn.dropout(L2,keep_prob)

W3 = tf.Variable(tf.truncated_normal([2000,1000],stddev=0.1))
b3 = tf.Variable(tf.zeros([1000])+0.1)
L3 = tf.nn.tanh(tf.matmul(L2_drop,W3)+b3)
L3_drop = tf.nn.dropout(L3,keep_prob)

W4 = tf.Variable(tf.truncated_normal([1000,10],stddev=0.1))
b4 = tf.Variable(tf.zeros([10])+0.1)
prediction = tf.nn.softmax(tf.matmul(L3_drop,W4) + b4)

#二次代價函數
# loss = tf.reduce_mean(tf.square(y-prediction))
#使用softmax交叉熵代價函數
loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y,logits=prediction))
#使用梯度下降法進行訓練
train_step = tf.train.GradientDescentOptimizer(0.2).minimize(loss)

#初始化變量
init = tf.global_variables_initializer()

#結果存放在一個布爾型列表中
#argmax返回一維張量中最大值所在的位置
correct_prediction = tf.equal(tf.argmax(y,1),tf.argmax(prediction,1))
#求準確率
accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))

with tf.Session() as sess:
    sess.run(init)
    for epoch in range(31):
        for batch in range(n_batch):
            batch_xs,batch_ys = mnist.train.next_batch(batch_size)
            sess.run(train_step,feed_dict = {x:batch_xs,y:batch_ys,keep_prob:0.7})
        
        acc_test = sess.run(accuracy,feed_dict = {x:mnist.test.images,y:mnist.test.labels,keep_prob:1.0})
        acc_train = sess.run(accuracy,feed_dict = {x:mnist.train.images,y:mnist.train.labels,keep_prob:1.0})
        print("Iter"+str(epoch)+",Testing Accuracy"+str(acc_test)+",Training Accuracy"+str(acc_train))

訓練時keep_prob傳入1.0運行結果:

#Dropout=1.0
Iter0,Testing Accuracy0.9501,Training Accuracy0.9604727
Iter1,Testing Accuracy0.9582,Training Accuracy0.97545457
Iter2,Testing Accuracy0.964,Training Accuracy0.98285455
Iter3,Testing Accuracy0.9653,Training Accuracy0.9866
Iter4,Testing Accuracy0.9676,Training Accuracy0.9883818
Iter5,Testing Accuracy0.9684,Training Accuracy0.9896909
Iter6,Testing Accuracy0.9698,Training Accuracy0.9906
Iter7,Testing Accuracy0.9699,Training Accuracy0.99127275
Iter8,Testing Accuracy0.9706,Training Accuracy0.9918
Iter9,Testing Accuracy0.97,Training Accuracy0.9923273
Iter10,Testing Accuracy0.9702,Training Accuracy0.9927818
Iter11,Testing Accuracy0.9699,Training Accuracy0.9931091
Iter12,Testing Accuracy0.9707,Training Accuracy0.99334544
Iter13,Testing Accuracy0.9705,Training Accuracy0.99365455
Iter14,Testing Accuracy0.9714,Training Accuracy0.9938545
Iter15,Testing Accuracy0.971,Training Accuracy0.9940182
Iter16,Testing Accuracy0.9708,Training Accuracy0.9942182
Iter17,Testing Accuracy0.9712,Training Accuracy0.99432725
Iter18,Testing Accuracy0.9708,Training Accuracy0.9944909
Iter19,Testing Accuracy0.9711,Training Accuracy0.9946
Iter20,Testing Accuracy0.9716,Training Accuracy0.99472725
Iter21,Testing Accuracy0.9714,Training Accuracy0.9948364
Iter22,Testing Accuracy0.9716,Training Accuracy0.99485457
Iter23,Testing Accuracy0.9718,Training Accuracy0.99496365
Iter24,Testing Accuracy0.9719,Training Accuracy0.99505454
Iter25,Testing Accuracy0.9714,Training Accuracy0.9951636
Iter26,Testing Accuracy0.9714,Training Accuracy0.9952545
Iter27,Testing Accuracy0.9717,Training Accuracy0.9953091
Iter28,Testing Accuracy0.9716,Training Accuracy0.99538183
Iter29,Testing Accuracy0.9713,Training Accuracy0.9954364
Iter30,Testing Accuracy0.9716,Training Accuracy0.9954727

訓練時keep_prob傳入0.7運行結果:

#Dropout=0.7
Iter0,Testing Accuracy0.9152,Training Accuracy0.91032726
Iter1,Testing Accuracy0.9309,Training Accuracy0.9278
Iter2,Testing Accuracy0.9376,Training Accuracy0.9334
Iter3,Testing Accuracy0.9399,Training Accuracy0.939
Iter4,Testing Accuracy0.9458,Training Accuracy0.9450182
Iter5,Testing Accuracy0.9455,Training Accuracy0.94734544
Iter6,Testing Accuracy0.9487,Training Accuracy0.9498364
Iter7,Testing Accuracy0.9522,Training Accuracy0.9538
Iter8,Testing Accuracy0.9533,Training Accuracy0.9561273
Iter9,Testing Accuracy0.9556,Training Accuracy0.9581091
Iter10,Testing Accuracy0.9564,Training Accuracy0.9590909
Iter11,Testing Accuracy0.9573,Training Accuracy0.9617091
Iter12,Testing Accuracy0.9588,Training Accuracy0.9626727
Iter13,Testing Accuracy0.9592,Training Accuracy0.96376365
Iter14,Testing Accuracy0.9623,Training Accuracy0.96532726
Iter15,Testing Accuracy0.9611,Training Accuracy0.9666182
Iter16,Testing Accuracy0.9629,Training Accuracy0.96805453
Iter17,Testing Accuracy0.9644,Training Accuracy0.9690727
Iter18,Testing Accuracy0.9651,Training Accuracy0.96985453
Iter19,Testing Accuracy0.9652,Training Accuracy0.97105455
Iter20,Testing Accuracy0.9661,Training Accuracy0.9717818
Iter21,Testing Accuracy0.9661,Training Accuracy0.9724182
Iter22,Testing Accuracy0.9661,Training Accuracy0.97276366
Iter23,Testing Accuracy0.9676,Training Accuracy0.97403634
Iter24,Testing Accuracy0.969,Training Accuracy0.9750182
Iter25,Testing Accuracy0.9699,Training Accuracy0.975
Iter26,Testing Accuracy0.9684,Training Accuracy0.97556365
Iter27,Testing Accuracy0.969,Training Accuracy0.97663635
Iter28,Testing Accuracy0.9699,Training Accuracy0.97694546
Iter29,Testing Accuracy0.9703,Training Accuracy0.97761816
Iter30,Testing Accuracy0.9706,Training Accuracy0.9779091

結果分析:
當keep_prob=1.0的時候,也就是訓練的時候所有神經元都工作,也就相當於沒有使用Dropout。當keep_prob=0.7的時候,每一次訓練就有70%的神經元工作,而30%的神經元不工作。
從結果來看:
1.當keep_prob=1.0的時候,測試樣本的準確率和訓練樣本的準確率發生了較大偏差,這就是過擬合,也就是說,訓練樣本的準確率較高,而如果用新樣本進行測試,準確率將達不到訓練樣本的準確率。而當keep_prob=0.7的時候,測試樣本的準確率和訓練樣本的準確率基本一致,所以使用Dropout能夠防止過擬合。
2.當keep_prob=0.7的時候,準確率的收斂速度明顯要比keep_prob=1.0的時候的慢。也就是說使用Dropout收斂速度會變慢。
3.兩次結果顯示最後測試樣本的準確率都達到了0.97,那我們爲什麼還要使用Dropout,這是因爲本例其實不是一個很好的例子。本例的網絡模型還不夠複雜,假如我們使用像GoogLeNet這樣複雜的卷積神經網絡來訓練,而我們的數據集只有5萬,相對來說數據集是不夠的,如果不使用Dropout就很容發生過擬合,那麼訓練出來的模型,測試樣本的準確率與訓練樣本的準確率就會相差很大,不僅僅像本例相差只有0.02。
當我們使用一個非常複雜的網絡來訓練一個非常小的數據集的時候才能更明顯的看出Dropout的重要性。

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章