表情識別,Tensorflow使用VGGNet對Fer2013表情數據集進行分類

課程作業

VGGNet 參考論文Very Deep Convolutional Networks for Large-Scale Image Recognition

VGGNet代碼實現參考《Tensorflow實戰》黃文堅

Fer2013圖片48*48,train_set一共28708張圖,測試集有兩個,pub_test_set和pri_test_set各3589張圖。一共7種表情,類別爲0-6。

利用Google的Colab來跑模型,自己的電腦實在太慢了。


結果:

在經過41個epoch之後,模型收斂。

這時候對Fer2013 public測試集的分類精度爲47.45%。對private測試集的分類精度爲46.92%。

人眼分辨Fer2013的準確率爲65%左右。


步驟:

1.將一整個Fer2013分爲訓練集和兩個測試集,分別存爲csv文件。

2.讀入三個文件。

3.將48*48的圖片resize爲224*224的標準大小圖片。

4.進行訓練,在每次訓練完成後都直接對兩個測試集進行測試。


遇到的問題:

1.一開始沒將圖片resize爲224*224,48*48的圖片經過VGG的5層2*2的池化層之後,圖片大小變爲了1*1,特徵全沒了,訓練失敗。

2.學習速率過低,一開始的學習速率定爲0.001,導致模型收斂極慢。

3.訓練集過大,程序中需要將48*48的圖像改成224*224的圖。無論是在程序中進行resize還是讀取resize之後的csv文件都特別慢,值得一提的是,resize之後的train_set的大小爲7.53 GB。

    ---解決:使用openCV的cv2.resize來改變圖像大小,速度快了很多。

    ---疑問:爲什麼openCV的resize比自己實現的resize快這麼多?是不是因爲openCV的resize用的不是python,是C或彙編之類的?

4.用Colab來跑也還是太慢了,跑到最後纔讀完數據並訓練了19個epoch,目標訓練100個epoch,但是Colab會把長時間佔用GPU的我踢掉。

    ---解決:在解決了上面的問題後,時間已經夠模型收斂了。


VGGNet的具體代碼是直接抄書的

上代碼:

#@title
# -*- coding: utf-8 -*-
"""
Created on Sat Jun 16 10:00:07 2018
@author: Administrator
"""
from datetime import datetime
import tensorflow as tf
import pandas as pd
import numpy as np
import csv
import cv2
 
 
f_train = open("drive//fer2013//traincpy.csv", encoding = 'UTF-8')
df_train = pd.read_csv(f_train)
#f_train_resize = open("drive//fer2013//trainresize.csv", encoding = 'UTF-8')
#df_train_resize = pd.read_csv(f_train_resize)
f_test_pub = open("drive//fer2013//valcpy.csv", encoding = 'UTF-8')
df_test_pub = pd.read_csv(f_test_pub)
f_test_pri = open("drive//fer2013//testcpy.csv", encoding = 'UTF-8')
df_test_pri = pd.read_csv(f_test_pri)
print('read csv file finished')
 
 
train_featuresets = df_train.iloc[1: , 1: ]
train_emotionsets = df_train.iloc[1: , 0:1]
test_pub_featuresets = df_test_pub.iloc[0: , 1: ]
test_pub_emotionsets = df_test_pub.iloc[0: , 0:1]
test_pri_featuresets = df_test_pri.iloc[0: , 1: ]
test_pri_emotionsets = df_test_pri.iloc[0: , 0:1]
print('dataset load finished')
 
#train_featuresets_resize = df_train_resize.iloc[1: , 0: ]
#train_feature_resize = tf.constant(train_featuresets_resize)
#train_feature_resize = tf.reshape(train_feature_resize, [-1, 224, 224, 1])
#train_emotion = np.reshape(np.array(train_emotionsets, dtype = 'float32'), (-1))
#print(train_feature_resize.shape)
 
#train_feature = tf.constant(train_featuresets)
#train_emotion = tf.constant(train_emotionsets)
 
#train_feature = tf.reshape(train_feature, [-1, 48, 48, 1])
#train_emotion = tf.reshape(train_emotion, [-1, 1])
 
#雙線性插值,講48*48的圖片變成224*224
 
def resize(src, new_size):
    dst_w = 224
    dst_h = 224 # 目標圖像寬高
    src_h = 48
    src_w = 48 # 源圖像寬高
    if src_h == dst_h and src_w == dst_w:
        return src.copy()
    scale_x = float(src_w) / dst_w # x縮放比例
    scale_y = float(src_h) / dst_h # y縮放比例
 
    # 遍歷目標圖像,插值
    dst = np.zeros((dst_h, dst_w, 1), dtype=np.uint8)
    for n in range(1): # 對channel循環
        for dst_y in range(dst_h): # 對height循環
            for dst_x in range(dst_w): # 對width循環
                # 目標在源上的座標
                src_x = (dst_x + 0.5) * scale_x - 0.5
                src_y = (dst_y + 0.5) * scale_y - 0.5
                # 計算在源圖上四個近鄰點的位置
                src_x_0 = int(np.floor(src_x))
                src_y_0 = int(np.floor(src_y))
                src_x_1 = min(src_x_0 + 1, src_w - 1)
                src_y_1 = min(src_y_0 + 1, src_h - 1)
 
                #雙線性插值
                value0 = (src_x_1 - src_x) * src[src_y_0, src_x_0, n] + (src_x - src_x_0) * src[src_y_0, src_x_1, n]
                value1 = (src_x_1 - src_x) * src[src_y_1, src_x_0, n] + (src_x - src_x_0) * src[src_y_1, src_x_1, n]
                dst[dst_y, dst_x, n] = int((src_y_1 - src_y) * value0 + (src_y - src_y_0) * value1)
    return dst
 
 
 
 
#實在是太慢了,所以改用OPENCV了
#話說OPENCV爲什麼這麼快?
 
 
print('start resize 48*48 to 224*224')
train_feature_resize = []
train_feature = np.reshape(np.array(train_featuresets, dtype = 'float32'), (-1, 48, 48, 1))
train_emotion = np.reshape(np.array(train_emotionsets, dtype = 'float32'), (-1))
print('total ', train_feature.shape[0])
for i in range(train_feature.shape[0]):
#for i in range(640):
    if i%1000 == 0:
        print('now resize 48 --> 224 train set',i)
 
    #cv2.resize(src, dsize[, dst[, fx[, fy[, interpolation]]]]) → dst
    #pic = cv2.resize(pic, (400, 400), interpolation=cv2.INTER_CUBIC)
    train_feature_resize.append(cv2.resize(train_feature[i], (224, 224), interpolation=cv2.INTER_LINEAR))
    #print(train_feature_resize.shape)
    
#train_feature_resize = np.array(train_feature_resize, dtype = 'float32')
print(len(train_feature_resize))
print('train_feature resize finished')
 
 
 
 
 
test_pub_feature_resize = []
test_pub_feature = np.reshape(np.array(test_pub_featuresets, dtype = 'float32'), (-1, 48, 48, 1))
test_pub_emotion = np.reshape(np.array(test_pub_emotionsets, dtype = 'float32'), (-1))
for i in range(test_pub_feature.shape[0]):
##for i in range(320):
    if i%200 == 0:
        print('now resize 48 --> 224 pub test set',i)
    test_pub_feature_resize.append(cv2.resize(test_pub_feature[i], (224,224), interpolation=cv2.INTER_LINEAR))
test_pub_feature_resize = np.reshape(np.array(test_pub_feature_resize, dtype = 'float32'), (-1, 224, 224,1))
print(test_pub_feature_resize.shape)
print('test_pub resize finished')
 
test_pri_feature_resize = []
test_pri_feature = np.reshape(np.array(test_pri_featuresets, dtype = 'float32'), (-1, 48, 48, 1))
test_pri_emotion = np.reshape(np.array(test_pri_emotionsets, dtype = 'float32'), (-1))
for i in range(test_pri_feature.shape[0]):
#for i in range(320):
    if i%200 == 0:
        print('now resize 48 --> 224 pri test set',i)
    test_pri_feature_resize.append(cv2.resize(test_pri_feature[i], (224,224), interpolation=cv2.INTER_LINEAR))
test_pri_feature_resize = np.reshape(np.array(test_pri_feature_resize, dtype = 'float32'), (-1, 224, 224,1))
print(test_pri_feature_resize.shape)
print('test_pri resize finished')
#print(train_feature[0:2])
 
 
 
batch_size = 32
num_batches = 100
 
keep_prob = tf.placeholder(tf.float32)
X = tf.placeholder(tf.float32, [32, 224, 224, 1])
Y = tf.placeholder(tf.int32)
    
# 用來創建卷積層並把本層的參數存入參數列表
# input_op:輸入的tensor name:該層的名稱 kh:卷積層的高 kw:卷積層的寬 n_out:輸出通道數,dh:步長的高 dw:步長的寬,p是參數列表
def conv_op(input_op, name, kh, kw, n_out, dh, dw, p):
    #獲取input_op的通道數
    n_in = input_op.get_shape()[-1].value
    with tf.name_scope(name) as scope:
        #卷積核參數
        kernel = tf.get_variable(scope + "w", shape = [kh, kw, n_in, n_out], dtype = tf.float32, initializer = tf.contrib.layers.xavier_initializer_conv2d())
        #對input_op進行卷積處理,卷及和爲kernel,步長
        #第一個參數需要做卷積的輸入圖像,是一個Tensor,[batch, in_height, in_width, in_channels]是一個4維的Tensor,float32和float64之一
        #第二個參數相當於CNN中的卷積核,是一個Tensor,[filter_height, filter_width, in_channels, out_channels]類型與參數input相同,第三維in_channels,是input的第四維
        #第三個參數卷積時在圖像每一維的步長,這是一個一維的向量,長度4
        #第四個參數padding:string類型的量,只能是"SAME","VALID"其中之一,SAME可以停留在圖像邊緣
        #結果返回一個Tensor,這個輸出,就是我們常說的feature map,shape仍然是[batch, height, width, channels]
        conv = tf.nn.conv2d(input_op, kernel, (1, dh, dw, 1), padding = "SAME")
        #創建一個張量,用0.0來填充
        bias_init_val = tf.constant(0.0, shape = [n_out], dtype = tf.float32)
        #轉成可訓練的參數,可以對他用Optimizer
        biases = tf.Variable(bias_init_val, trainable = True, name = 'b')
        #將偏差項bias加到conv上面,這裏是bias必須是一維的
        z = tf.nn.bias_add(conv, biases)
        #卷積層的輸出
        activation = tf.nn.relu(z, name = scope)
        #將kernel和biases加到參數列表
        p += [kernel, biases]
        return activation
 
#定義全連接層
def fc_op(input_op, name, n_out, p):
    #獲取通道數
    n_in = input_op.get_shape()[-1].value
    
    with tf.name_scope(name) as scope:
        #創建全連接層的參數,只有兩個維度,也用xavier_initializer來初始化
        kernel = tf.get_variable(scope+"w", shape = [n_in, n_out], dtype = tf.float32, initializer = tf.contrib.layers.xavier_initializer())
        #初始化biases,這裏用0.1來填充了
        biases = tf.Variable(tf.constant(0.1, shape = [n_out], dtype = tf.float32), name = 'b')
        activation = tf.nn.relu_layer(input_op, kernel, biases, name = scope)
        p += [kernel, biases]
        return activation
 
#定義最大池化層的創建函數
#maxpool即領域內取最大
def mpool_op(input_op, name, kh, kw, dh, dw):
    #這裏tf.nn.max_pool(value, ksize, strides, padding, name=None)
    #value輸入通常是feature map
    #池化窗口的大小,不再batch和channel上池化,所以兩個爲1
    #窗口在每個維度上的滑動步長
    #和卷積類似
    #返回一個Tensor,類型不變,shape仍然是[batch, height, width, channels]這種形式
    return tf.nn.max_pool(input_op, ksize = [1, kh, kw, 1], strides = [1, dh, dw, 1], padding = 'SAME', name = name)
 
 
#創建VGGNET-16的網絡結構
def inference_op(input_op, keep_prob):
    p = []
    conv1_1 = conv_op(input_op, name = "conv1_1", kh = 3, kw = 3, n_out = 64, dh = 1, dw = 1, p = p)
    conv1_2 = conv_op(conv1_1, name = "conv1_2", kh = 3, kw = 3, n_out = 64, dh = 1, dw = 1, p = p)
    #這裏每次都會輸出結果的邊長減半,但是通道數加倍了
    pool1 = mpool_op(conv1_2, name = "pool1", kh = 2, kw = 2, dw = 2, dh = 2)
    
    conv2_1 = conv_op(pool1, name = "conv2_1", kh = 3, kw = 3, n_out = 128, dh = 1, dw = 1, p = p)
    conv2_2 = conv_op(conv2_1, name = "conv2_2", kh = 3, kw = 3, n_out = 128, dh = 1, dw = 1, p = p)
    pool2 = mpool_op(conv2_2, name = "pool1", kh = 2, kw = 2, dw = 2, dh = 2)
    
    conv3_1 = conv_op(pool2, name = "conv3_1", kh = 3, kw = 3, n_out = 256, dh = 1, dw = 1, p = p)
    conv3_2 = conv_op(conv3_1, name = "conv3_2", kh = 3, kw = 3, n_out = 256, dh = 1, dw = 1, p = p)
    conv3_3 = conv_op(conv3_2, name = "conv3_3", kh = 3, kw = 3, n_out = 256, dh = 1, dw = 1, p = p)
    pool3 = mpool_op(conv3_3, name = "pool3", kh = 2, kw = 2, dh = 2, dw = 2)
    
    conv4_1 = conv_op(pool3, name = "conv4_1", kh = 3, kw = 3, n_out = 512, dh = 1, dw = 1, p = p)
    conv4_2 = conv_op(conv4_1, name = "conv4_2", kh = 3, kw = 3, n_out = 512, dh = 1, dw = 1, p = p)
    conv4_3 = conv_op(conv4_2, name = "conv4_3", kh = 3, kw = 3, n_out = 512, dh = 1, dw = 1, p = p)
    pool4 = mpool_op(conv4_3, name = "pool4", kh = 2, kw = 2, dh = 2, dw = 2)
    
    conv5_1 = conv_op(pool4, name = "conv5_1", kh = 3, kw = 3, n_out = 512, dh = 1, dw = 1, p = p)
    conv5_2 = conv_op(conv5_1, name = "conv5_2", kh = 3, kw = 3, n_out = 512, dh = 1, dw = 1, p = p)
    conv5_3 = conv_op(conv5_2, name = "conv5_3", kh = 3, kw = 3, n_out = 512, dh = 1, dw = 1, p = p)
    pool5 = mpool_op(conv5_3, name = "pool5", kh = 2, kw = 2, dh = 2, dw = 2)
    
    shp = pool5.get_shape()
    #將每個樣本化爲長度爲(長*寬*通道)的一維向量
    flattened_shape = shp[1].value * shp[2].value * shp[3].value
    resh1 = tf.reshape(pool5, [-1, flattened_shape], name = "resh1")
    
    #鏈接到一個隱含節點爲4096的全連接層
    fc6 = fc_op(resh1, name = "fc6", n_out = 4096, p = p)
    #dropout防止或減輕過擬合而使用的函數,它一般用在全連接層。
    #Dropout就是在不同的訓練過程中隨機扔掉一部分神經元。
    #訓練時的保留率爲0.5,預測時爲1.0
    fc6_drop = tf.nn.dropout(fc6, keep_prob, name = "fc6_drop")
    #fc6_drop = fc6
    fc7 = fc_op(fc6_drop, name = "fc7", n_out = 4096, p = p)
    fc7_drop = tf.nn.dropout(fc7, keep_prob, name = "fc7_drop")
    #fc7_drop = fc7
    fc8 = fc_op(fc7_drop, name = "fc8", n_out = 7, p = p)
    #得到分類輸出概率
    softmax = tf.nn.softmax(fc8)
    #得到概率最大的類別
    predictions = tf.argmax(softmax, 1)
    #print('in inference op : softmax', softmax)
    #print('in inference op : prediction',predictions)
    return predictions, softmax
 
 
#這裏通道變多可以增加表達能力,每個通道都是由一個卷積核算出來的,有的特徵對不同的卷積核敏感,多通道可以把他們都保留下來
def train_vgg():
    predictions, softmax = inference_op(X, keep_prob)    
    #
    #這裏肯定有問題
    #
    cross_entropy = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels = Y, logits = softmax))
    loss = tf.reduce_mean(cross_entropy)
    
    train_op = tf.train.GradientDescentOptimizer(0.006).minimize(loss)
    
    #初始化全局參數
    max_pub = 0
    max_pri = 0
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        print('now strat train')
        print('the length of train_emotion:', len(train_emotion))
        for i in range(100):
            start = 0
            end = start + batch_size
            step = 0
            while(end < len(train_emotion)):
            #while(end < 640):
                #這裏可能有問題
                train_op.run(feed_dict = {X:np.reshape(np.array(train_feature_resize[start:end], dtype = 'float32'), (-1, 224, 224,1)), keep_prob:0.5, Y:train_emotion[start:end]})
                start += batch_size
                end += batch_size
                if step%100 == 0:
                    #print(tf.argmax(softmax, 1).eval())
                    print('round: ', i,'step: ' , step)
                step += 1
                
            print('goint to start public prediction')
            print('the length of test_pub_emotion :' ,len(test_pub_emotion))
            start1 = 0
            end1 = start1 + batch_size
            k = 0
            while(end1 < len(test_pub_emotion)):
            #while(end1 < 320):
                predict = sess.run(predictions, feed_dict = {X:test_pub_feature_resize[start1:end1], keep_prob:1})
                #prediction.append(predict.tolist())
                accurate = test_pub_emotion[start1:end1]
                    
                if  end1%512 == 0:
                    print(predict)
                    #print(accurate)
                
                for w in range(len(predict)):
                    if predict[w] == accurate[w]:
                        k += 1     
                            
                start1 += batch_size
                end1 += batch_size 
                    
            accurate_rate = k / len(test_pub_emotion)
            if accurate_rate > max_pub:
                max_pub = accurate_rate
            print('end public prediction')
            print('the public accurate is : ', accurate_rate)
            print('the max pub_accurate :', max_pub)
            
            print('goint to start private prediction')
            print('the length of test_pri_emotion :' ,len(test_pri_emotion))
            start1 = 0
            end1 = start1 + batch_size
            k = 0
            while(end1 < len(test_pri_emotion)):
            #while(end1 < 320):
                predict = sess.run(predictions, feed_dict = {X:test_pri_feature_resize[start1:end1], keep_prob:1})
                #prediction.append(predict.tolist())
                accurate = test_pri_emotion[start1:end1]
                    
                if  end1%512 == 0:
                    print(predict)
                    #print(accurate)
                
                for w in range(len(predict)):
                    if predict[w] == accurate[w]:
                        k += 1     
                            
                start1 += batch_size
                end1 += batch_size 
                    
            accurate_rate = k / len(test_pub_emotion)
            if accurate_rate > max_pri:
                max_pri = accurate_rate
            print('end public prediction')
            print('the public accurate is : ', accurate_rate)
            print('the max pri_accurate:', max_pri)
                  
train_vgg()

結果:

在18個epoch之後精度爲43.8左右,且對比之前有較大提升,如果再訓練應該還能更好。

最後的精度果然更好了。但是比起ResNet這樣的還是有差距,並且因爲參數數量還是多的,所以速度還是慢。

這裏的矩陣是我隨便找的幾個predict數據。


發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章