之前看過優達學城的一個深度學習的課程,主講是個youtube上的印度網紅,做了一個GAN生成人臉的簡單項目(但是訓練的是28*28的人臉),這次掏出來再看完paper之後,再重新試水訓練一下的人臉。算是正式開始學習深度學習和tensorflow的一個項目,畢竟明年可能要在實際項目中應用了。
GAN生成模型
GAN是Ian Goodfellow 在14年在Generative Adversarial Nets中提出的一種生成模型。這種模型總共有兩個完全不同的網絡構成,一個generator和一個discriminator, 通過對抗訓練,得到了很長神奇的效果,目前是深度學習領域非常火的一種網絡。 generator的任務是生成假樣本去欺騙discriminator,而discriminator的任務則是從真假樣本中區分出假樣本,兩個網絡相互對抗訓練。當訓練到最優情況下, generator生成的樣本就像從真樣本中獲取的一樣,discriminator對於任意樣本都是以0.5的概率輸出真或假。
雖然GAN獲得了非常神奇的效果,但是也依然村一些未解決的問題
- 難以訓練,難以把握discriminator和generator之間的平衡,discriminator往往很容易快速收斂,導致generator無法進一步得到更新
- mode collapse, 即訓練得到的模型只輸出部分學習樣本,導致輸出樣本多樣性過低。
- loss無法反應當前訓練模型的好壞
對於訓練數據用表示, 其分佈爲, 則是discriminator,需要區分出真樣本。樣本來自真實樣本的概率越高,則越大。
爲一個低維的均勻分佈或者高斯分佈, 是generator,它將的分佈變換爲真實數據的分佈來欺騙。
損失函數
網絡的loss function如下
DCGAN
DCGAN是在原本的GAN結構基礎上對於網絡架構進行了一些改進,使得網絡能更快更穩定收斂。
改動如下:
- 全連接層用stride卷積代替
- 每層卷積層後接batchNorm層,
- generator的最後一層用tanh, discriminator的最後一層用sigmoid.
- generator的激活函數用ReLu, discriminator的激活函數用Leaky-Relu
paper中的網絡結構如下
在我實現過程中發現這個結構,discrimintor收斂太快,generator卻沒有收斂到,所以我改動了一下,代碼如下
def generator(z, is_train = True):
with tf.variable_scope("generator", reuse= not is_train):
x1 = tf.layers.dense(z, 8*8*512)
x1 = tf.nn.tanh(x1) #計算x的正切值
x1 = tf.reshape(x1, (-1, 8, 8, 512))
x1 = tf.layers.batch_normalization(x1, training=is_train)
x1 = tf.nn.relu(x1)
x2 = tf.layers.conv2d_transpose(x1, 256, 5, strides=2, padding='same')
x2 = tf.layers.batch_normalization(x2, training=is_train)
x2 = tf.nn.relu(x2)
x3 = tf.layers.conv2d_transpose(x2, 128, 5, strides=2, padding='same')
x3 = tf.nn.relu(x3)
x3 = tf.layers.batch_normalization(x3, training=is_train)
logits = tf.layers.conv2d_transpose(x3, 3, 5, strides=2, padding='same')
out = tf.tanh(logits)
return out
discriminator的結構和generator對稱,如下
def Discriminator(images, reuse=False):
with tf.variable_scope("discriminator", reuse= reuse):
alpha = 0.2
x1 = tf.layers.conv2d(images, 128, 5, strides=2, padding='same')
bn1 = tf.layers.batch_normalization(x1, training=True)
relu1 = tf.nn.leaky_relu(bn1, alpha)
x2 = tf.layers.conv2d(relu1, 256, 5, strides=2, padding='same')
bn2 = tf.layers.batch_normalization(x2, training=True)
relu2 = tf.nn.leaky_relu(bn2, alpha)
x3 = tf.layers.conv2d(relu2, 512, 5, strides=2, padding='same')
bn3 = tf.layers.batch_normalization(x3, training=True)
relu3 = tf.nn.leaky_relu(bn3, alpha)
# Flatten it
flat = tf.reshape(relu3, (-1, 8*8*512))
logits = tf.layers.dense(flat, 1)
out = tf.sigmoid(logits)
return out, logits
loss function and Optimization
discriminator和generator都定義好了,需要定義網絡的損失函數,這裏按照paper中用的是交叉熵。
def model_loss(input_real, input_z):
g_model = generator(input_z)
d_model_real, d_logits_real = Discriminator(input_real)
d_model_fake, d_logits_fake = Discriminator(g_model, reuse=True)
d_loss_real = tf.reduce_mean(
tf.nn.sigmoid_cross_entropy_with_logits(
logits=d_logits_real, labels=tf.ones_like(d_model_real) * 0.999))
d_loss_fake = tf.reduce_mean(
tf.nn.sigmoid_cross_entropy_with_logits(
logits=d_logits_fake, labels=tf.zeros_like(d_model_fake)))
# discriminal loss
d_loss = d_loss_real + d_loss_fake
g_loss = tf.reduce_mean(
tf.nn.sigmoid_cross_entropy_with_logits(logits=d_logits_fake,
labels=tf.ones_like(d_model_fake)))
return d_loss, g_loss
優化器採用Adam
def model_opt(d_loss, g_loss, learning_rate, beta1):
"""
Get optimization operations
:param d_loss: Discriminator loss Tensor
:param g_loss: Generator loss Tensor
:param learning_rate: Learning Rate Placeholder
:param beta1: The exponential decay rate for the 1st moment in the optimizer
:return: A tuple of (discriminator training operation, generator training operation)
"""
# Get the trainable_variables, split into G and D parts
t_vars = tf.trainable_variables()
g_vars = [var for var in t_vars if var.name.startswith('generator')]
#print('g_Var:',g_vars)
d_vars = [var for var in t_vars if var.name.startswith('discriminator')]
#print('d_var:',d_vars)
with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)):
d_train_opt = tf.train.AdamOptimizer(learning_rate, beta1 = beta1).minimize(d_loss, var_list=d_vars)
with tf.control_dependencies(tf.get_collection(tf.GraphKeys.UPDATE_OPS)):
g_train_opt = tf.train.AdamOptimizer(learning_rate, beta1 = beta1).minimize(g_loss, var_list=g_vars)
return d_train_opt, g_train_opt
train
這裏有個trick是看得網上的套路,訓練一次D的時候,訓練兩次G,據說這麼搞效果好。
def train(epoch_count, batch_size, z_dim, learning_rate, beta1, get_batches, data_shape, data_image_mode):
"""
Train the GAN
:param epoch_count: Number of epochs
:param batch_size: Batch Size
:param z_dim: Z dimension
:param learning_rate: Learning Rate
:param beta1: The exponential decay rate for the 1st moment in the optimizer
:param get_batches: Function to get batches
:param data_shape: Shape of the data
:param data_image_mode: The image mode to use for images ("RGB" or "L")
"""
InputHolder,z_InputHolder,LearningRate = model_inputs(data_shape[1], data_shape[2], data_shape[3], z_dim)
d_loss, g_loss = model_loss(InputHolder, z_InputHolder)
d_train_opt, g_train_opt = model_opt(d_loss, g_loss, LearningRate, beta1)
with tf.Session() as sess:
writer = tf.summary.FileWriter("logs/", sess.graph) #第一個參數指定生成文件的目錄。
example_z = np.random.uniform(-1, 1, size=[25, z_dim]) # 固定噪聲,觀察
g_loss_sum = tf.summary.scalar("g_loss", g_loss)
d_loss_sum = tf.summary.scalar("d_loss", d_loss)
# 變量初始化
sess.run(tf.global_variables_initializer())
batchNum = 0
for epoch_i in range(epoch_count):
for batch_images in get_batches(batch_size):
batch_images = batch_images * 2
# Sample random noise for G
batch_z = np.random.uniform(-1.0, 1.0, size=(batch_size, z_dim))
#batch_z = np.random.normal(0, 1.0, size=(batch_size, z_dim))
# Run optimizers
_,summary_str = sess.run([d_train_opt, d_loss_sum], feed_dict={InputHolder: batch_images, z_InputHolder: batch_z,
LearningRate: learning_rate})
writer.add_summary(summary_str, batchNum)
_ ,summary_str = sess.run([g_train_opt,g_loss_sum], feed_dict={InputHolder: batch_images,z_InputHolder: batch_z,
LearningRate: learning_rate})
writer.add_summary(summary_str, batchNum)
_ ,summary_str = sess.run([g_train_opt,g_loss_sum], feed_dict={InputHolder: batch_images,z_InputHolder: batch_z,
LearningRate: learning_rate})
writer.add_summary(summary_str, batchNum)
batchNum += 1
if batchNum % 100 == 0:
samples = sess.run(generator(z_InputHolder, False), feed_dict={z_InputHolder: example_z})
images_grid = helper.images_square_grid(samples, data_image_mode)
if not os.path.exists(SaveSample_dir):
os.makedirs(SaveSample_dir)
strFileName = "Epoch_{}_{}.jpg".format(epoch_i+1, batchNum)
strFileName = os.path.join(SaveSample_dir,strFileName)
scipy.misc.imsave(strFileName, images_grid)
if batchNum % 10 == 0:
# At the end of each epoch, get the losses and print them out
train_loss_d = d_loss.eval({InputHolder: batch_images, z_InputHolder: batch_z,LearningRate: learning_rate})
train_loss_g = g_loss.eval({z_InputHolder: batch_z})
print("Epoch {}/{}...".format(epoch_i+1, epoch_count),
"Discriminator Loss: {:.4f}...".format(train_loss_d),
"Generator Loss: {:.4f}".format(train_loss_g))
if np.mod(batchNum, 500) == 0:
if not os.path.exists(checkpoint_dir):
os.makedirs(checkpoint_dir)
saver = tf.train.Saver(max_to_keep=1)
saver.save(sess, os.path.join(checkpoint_dir,"DCGAN.model"),global_step = batchNum)
show_generator_output(sess, 25, z_InputHolder, data_image_mode)
return sess
實驗結果
程序還在跑中,目前一個epoch還沒跑完已經可以看到一些人臉了,多個epoch之後效果應該會更好,等跑完了再更新,深度學習還挺奇妙的,做到了傳統算法難以達到的功能。
人臉生成效果
batchsize = 64 ,跑了2600個batch之後效果如下,看上去存在一定不足,但是一些已經可以看出是人臉了,生成圖像大小是64*64,訓練圖片來自celeba對齊後的數據。