試水DCGAN(tensorflow平臺實現的face-generation)

之前看過優達學城的一個深度學習的課程,主講是個youtube上的印度網紅,做了一個GAN生成人臉的簡單項目(但是訓練的是28*28的人臉),這次掏出來再看完paper之後,再重新試水訓練一下64×6464\times64的人臉。算是正式開始學習深度學習和tensorflow的一個項目,畢竟明年可能要在實際項目中應用了。

GAN生成模型

GAN是Ian Goodfellow 在14年在Generative Adversarial Nets中提出的一種生成模型。這種模型總共有兩個完全不同的網絡構成,一個generator和一個discriminator, 通過對抗訓練,得到了很長神奇的效果,目前是深度學習領域非常火的一種網絡。 generator的任務是生成假樣本去欺騙discriminator,而discriminator的任務則是從真假樣本中區分出假樣本,兩個網絡相互對抗訓練。當訓練到最優情況下, generator生成的樣本就像從真樣本中獲取的一樣,discriminator對於任意樣本都是以0.5的概率輸出真或假。

雖然GAN獲得了非常神奇的效果,但是也依然村一些未解決的問題

  • 難以訓練,難以把握discriminator和generator之間的平衡,discriminator往往很容易快速收斂,導致generator無法進一步得到更新
  • mode collapse, 即訓練得到的模型只輸出部分學習樣本,導致輸出樣本多樣性過低。
  • loss無法反應當前訓練模型的好壞

對於訓練數據用xx表示, 其分佈爲pxp_x, D(x)D(x)則是discriminator,需要區分出真樣本。樣本來自真實樣本的概率越高,則D(x)D(x)越大。
zz爲一個低維的均勻分佈或者高斯分佈, G(x)G(x)是generator,它將zz的分佈變換爲真實數據的分佈來欺騙DD

損失函數

網絡的loss function如下
minGmaxDV(D,G)=Expdata(x)[logD(x)]+Ezpz(z)[log(1D(G(z)))] \mathop{min}_G\mathop{max}_DV(D,G)=E_{x∼p_{data}(x)}[logD(x)]+E_{z∼p_z(z)}[log(1−D(G(z)))]

DCGAN

DCGAN是在原本的GAN結構基礎上對於網絡架構進行了一些改進,使得網絡能更快更穩定收斂。
改動如下:

  • 全連接層用stride卷積代替
  • 每層卷積層後接batchNorm層,
  • generator的最後一層用tanh, discriminator的最後一層用sigmoid.
  • generator的激活函數用ReLu, discriminator的激活函數用Leaky-Relu

paper中的網絡結構如下
generator
在我實現過程中發現這個結構,discrimintor收斂太快,generator卻沒有收斂到,所以我改動了一下,代碼如下

def  generator(z, is_train = True):
    
    with tf.variable_scope("generator", reuse= not is_train):
        
   
        x1 = tf.layers.dense(z, 8*8*512)
        x1 = tf.nn.tanh(x1)   #計算x的正切值
    
        x1 = tf.reshape(x1, (-1, 8, 8, 512))
        x1 = tf.layers.batch_normalization(x1, training=is_train)
        x1 = tf.nn.relu(x1)

        x2 = tf.layers.conv2d_transpose(x1, 256, 5, strides=2, padding='same')
        x2 = tf.layers.batch_normalization(x2, training=is_train)
        x2 = tf.nn.relu(x2)

        x3 = tf.layers.conv2d_transpose(x2, 128, 5, strides=2, padding='same')
        x3 = tf.nn.relu(x3)
        x3 = tf.layers.batch_normalization(x3, training=is_train)
    
        
        logits = tf.layers.conv2d_transpose(x3, 3, 5, strides=2, padding='same')
        out = tf.tanh(logits)
        
        return out

discriminator的結構和generator對稱,如下

def  Discriminator(images, reuse=False):
    
    with tf.variable_scope("discriminator", reuse= reuse):
        alpha = 0.2
        x1 = tf.layers.conv2d(images, 128, 5, strides=2, padding='same')
        bn1 = tf.layers.batch_normalization(x1, training=True)
        relu1 = tf.nn.leaky_relu(bn1, alpha)
        
        x2 = tf.layers.conv2d(relu1, 256, 5, strides=2, padding='same')
        bn2 = tf.layers.batch_normalization(x2, training=True)
        relu2 = tf.nn.leaky_relu(bn2, alpha)
        
        x3 = tf.layers.conv2d(relu2, 512, 5, strides=2, padding='same')
        bn3 = tf.layers.batch_normalization(x3, training=True)
        relu3 = tf.nn.leaky_relu(bn3, alpha)

        # Flatten it
        flat = tf.reshape(relu3, (-1, 8*8*512))
        logits = tf.layers.dense(flat, 1)
        out = tf.sigmoid(logits)
        
        return out, logits

loss function and Optimization

discriminator和generator都定義好了,需要定義網絡的損失函數,這裏按照paper中用的是交叉熵。

def model_loss(input_real, input_z):
    
    g_model = generator(input_z)
    
    d_model_real, d_logits_real = Discriminator(input_real)
    d_model_fake, d_logits_fake = Discriminator(g_model, reuse=True)
    
    d_loss_real = tf.reduce_mean(
        tf.nn.sigmoid_cross_entropy_with_logits(
            logits=d_logits_real, labels=tf.ones_like(d_model_real) * 0.999))
    
    d_loss_fake = tf.reduce_mean(
        tf.nn.sigmoid_cross_entropy_with_logits(
            logits=d_logits_fake, labels=tf.zeros_like(d_model_fake)))
    
    # discriminal loss
    d_loss = d_loss_real + d_loss_fake
    
    g_loss = tf.reduce_mean(
             tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logits_fake,
                                                     labels=tf.ones_like(d_model_fake)))
    
    return   d_loss, g_loss

優化器採用Adam

def model_opt(d_loss, g_loss, learning_rate, beta1):
    """
    Get optimization operations
    :param d_loss: Discriminator loss Tensor
    :param g_loss: Generator loss Tensor
    :param learning_rate: Learning Rate Placeholder
    :param beta1: The exponential decay rate for the 1st moment in the optimizer
    :return: A tuple of (discriminator training operation, generator training operation)
    """
    # Get the trainable_variables, split into G and D parts
    t_vars = tf.trainable_variables()
    g_vars = [var for var in t_vars if var.name.startswith('generator')]
    #print('g_Var:',g_vars)
    d_vars = [var for var in t_vars if var.name.startswith('discriminator')]
    #print('d_var:',d_vars)

    with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)):  
        d_train_opt = tf.train.AdamOptimizer(learning_rate, beta1 = beta1).minimize(d_loss, var_list=d_vars)
    with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)):  
        g_train_opt = tf.train.AdamOptimizer(learning_rate, beta1 = beta1).minimize(g_loss, var_list=g_vars)

    
    return d_train_opt, g_train_opt

train

這裏有個trick是看得網上的套路,訓練一次D的時候,訓練兩次G,據說這麼搞效果好。

def train(epoch_count, batch_size, z_dim, learning_rate, beta1, get_batches, data_shape, data_image_mode):
    """
    Train the GAN
    :param epoch_count: Number of epochs
    :param batch_size: Batch Size
    :param z_dim: Z dimension
    :param learning_rate: Learning Rate
    :param beta1: The exponential decay rate for the 1st moment in the optimizer
    :param get_batches: Function to get batches
    :param data_shape: Shape of the data
    :param data_image_mode: The image mode to use for images ("RGB" or "L")
    """
    InputHolder,z_InputHolder,LearningRate =  model_inputs(data_shape[1],  data_shape[2],  data_shape[3], z_dim) 
    
    d_loss, g_loss = model_loss(InputHolder, z_InputHolder)
    
    d_train_opt, g_train_opt = model_opt(d_loss, g_loss, LearningRate, beta1)  
        
    with tf.Session() as sess:
        
        writer = tf.summary.FileWriter("logs/", sess.graph) #第一個參數指定生成文件的目錄。

        example_z = np.random.uniform(-1, 1, size=[25, z_dim]) # 固定噪聲,觀察
        
    
        g_loss_sum = tf.summary.scalar("g_loss", g_loss)
        d_loss_sum = tf.summary.scalar("d_loss", d_loss)
        
        # 變量初始化
        sess.run(tf.global_variables_initializer())
        batchNum = 0
        for epoch_i in range(epoch_count):
            for batch_images in get_batches(batch_size):
               
                batch_images = batch_images * 2
                # Sample random noise for G
                
                batch_z = np.random.uniform(-1.0, 1.0, size=(batch_size, z_dim))
                    
                #batch_z = np.random.normal(0, 1.0, size=(batch_size, z_dim))

                # Run optimizers
                _,summary_str = sess.run([d_train_opt, d_loss_sum], feed_dict={InputHolder: batch_images, z_InputHolder: batch_z,
                                                     LearningRate: learning_rate})
               
                writer.add_summary(summary_str, batchNum)
                
                _ ,summary_str = sess.run([g_train_opt,g_loss_sum], feed_dict={InputHolder: batch_images,z_InputHolder: batch_z,
                                                     LearningRate: learning_rate})
                writer.add_summary(summary_str, batchNum)
                    
                _ ,summary_str = sess.run([g_train_opt,g_loss_sum], feed_dict={InputHolder: batch_images,z_InputHolder: batch_z,
                                                     LearningRate: learning_rate})
      
                writer.add_summary(summary_str, batchNum)
    
                batchNum += 1
                if batchNum % 100 == 0:
 
                    samples = sess.run(generator(z_InputHolder, False), feed_dict={z_InputHolder: example_z})
                    images_grid = helper.images_square_grid(samples, data_image_mode)
                    if not os.path.exists(SaveSample_dir):
                        os.makedirs(SaveSample_dir)
                        
                    strFileName = "Epoch_{}_{}.jpg".format(epoch_i+1, batchNum)   
                    strFileName = os.path.join(SaveSample_dir,strFileName)
                    scipy.misc.imsave(strFileName, images_grid)
                    
      
                if batchNum % 10 == 0:
                    # At the end of each epoch, get the losses and print them out
                    train_loss_d = d_loss.eval({InputHolder: batch_images, z_InputHolder: batch_z,LearningRate: learning_rate})
                    train_loss_g = g_loss.eval({z_InputHolder: batch_z})
                
                    print("Epoch {}/{}...".format(epoch_i+1, epoch_count),
                          "Discriminator Loss: {:.4f}...".format(train_loss_d),
                          "Generator Loss: {:.4f}".format(train_loss_g))

                    
                if np.mod(batchNum, 500) == 0:
                    
                    if not os.path.exists(checkpoint_dir):
                        os.makedirs(checkpoint_dir)
                            
                    saver = tf.train.Saver(max_to_keep=1)
                    saver.save(sess, os.path.join(checkpoint_dir,"DCGAN.model"),global_step = batchNum)
                
                
        show_generator_output(sess, 25, z_InputHolder, data_image_mode)
        return   sess 

實驗結果

程序還在跑中,目前一個epoch還沒跑完已經可以看到一些人臉了,多個epoch之後效果應該會更好,等跑完了再更新,深度學習還挺奇妙的,做到了傳統算法難以達到的功能。

人臉生成效果

batchsize = 64 ,跑了2600個batch之後效果如下,看上去存在一定不足,但是一些已經可以看出是人臉了,生成圖像大小是64*64,訓練圖片來自celeba對齊後的數據。
在這裏插入圖片描述

發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章