【閱讀筆記】Generative Adversarial Nets

原創

2019-02-27 19:20

Goodfellow I, Pouget-Abadie J, Mirza M, et al. Generative adversarial nets[C]//Advances in neural information processing systems. 2014: 2672-2680.
GitHub: https://github.com/goodfeli/adversarial

Abstract：

GAN 是一個通過對抗過程來估計生成模型的框架。我們同事訓練兩個模型：a generative model $G$ 來你和數據的概率分佈，a discriminative model $D$ 來判斷數據來自真實數據還是生成數據。訓練是一個兩人遊戲的最大最小化過程， $G$ 最大化 $D$ 判斷錯誤的概率， $D$ 最大化判斷正確的概率。在任意的函數空間內， $G$ 和 $D$ 的解唯一存在，此時 $G$ 完全你和訓練數據的分佈， $D$ 的的結果永遠爲1/2。當 $G$ 和 $D$ 被定義爲 multilayer perceptrons 時，可以通過 backpropagation 訓練。在訓練過程中完全不需要 Markov chains or unrolled approximate inference networks。

Introduction

在論文發表之前的 deap learning 中，判別模型有了很強大的應用，但是生成模型進展不大，本文把神經網絡應用在生成模型，並且有很好的效果。

Adversarial nets

爲了通過數據 $x$ 來學習分佈 $p_g$ ，我們定義一個先驗的輸入噪聲變量分佈 $p_z(z)$ ，然後把先驗的隨機變量 $z$ 映射到數據空間 $G(z;\theta_g)$ 。同樣的我們定義另外一個 multilayer perceptron $D(x; θ_d)$ ，輸出 $x$ 是否來自真實數據的概率。我們通過最大化分辨真實數據和生成數據來訓練 $D$ ，通過最小化 $log(1-D(G(z)))$ 來訓練 $G$ 。
換句話說， $G$ 和 $D$ 相當於玩一個 two-player minimax game with value function $V(G,D)$ ：
$min_Gmax_DV(G,D)=E_{x\sim P_{data}(x)}[logD(x)]+E_{z\sim p_z(z)}[log(1-D(G(z)))]$
下圖是一個訓練過程的示意圖，綠線是生成模型生成的數據，藍色虛線是判別模型判別的概率，黑色虛線是真實數據。

Theoretical Results

本文采用的 GAN 算法

Algorithm 1: Minibatch stochastic gradient descent training of generative adversarial nets. The number of steps to apply to the discriminator, k, is a hyperparameter. We used k=1, the least expensive option, in our experiments.

for number of training iterations do

for k steps do

Sample minibatch of m noise samples { $z^{(1)},…,z^{(m)}$ } from noise prior $p_g(z)$ .

Sample minibatch of m examples { $x^{(1)}, ..., x^{(m)}$ } from data generating distribution $p_{data}(x)$ .

Update the discriminator by ascending its stochastic gradient

end for

Sample minibatch of m noise samples { $z^{(1)},…,z^{(m)}$ } from noise prior $p_g(z)$ .

Update the generator by descending its stochastic gradient

end for

The gradient-based updates can use any standard gradient-based learning rule. We used momentum in our experiments.

Global Optimality of $p_g = p_{data}$

Proposition 1. 如果固定 $G$ ，最佳的判別器爲： $D_G^*(x)=\frac{p_{data}(x)}{p_{data}(x)+p_{g}(x)}$
Proof.
$V(G,D)=\int_xp_{data}(x)log(D(x))dx+\int_xp_z(z)log(1-D(g(z)))dz$
$=\int_xp_{data}(x)log(D_G(x))+p_g(x)log(1-D_G(x))dx$

求極值，由導數爲0可證。

Theorem 1. 當且僅當 $p_g = p_{data}$ 時， $C(G)=-log4$ ， $C(G)$ 爲代價函數也就是 $V(G,D_G^*)$
Proof.
顯然 $p_g = p_{data}$ 時， $C(G)=-log4$
在一般情況下
$C(G)=max_DV(G,D)$
$=E_{x\sim p_{data}}[logD^*_G(x)]+E_{z\sim p_z}[log(1-D^*_G(G(z)))]$
$=E_{x\sim p_{data}}[logD^*_G(x)]+E_{x\sim p_g}[log(1-D^*_G(x))]$
$=E_{x\sim p_{data}}[\frac{p_{data}(x)}{p_{data}(x)+p_{g}(x)}]+E_{x\sim p_g}[\frac{p_{g}(x)}{p_{data}(x)+p_{g}(x)}]$
$=-log4+KL(p_{data}||\frac{p_{data}+p_{g}}{2})+KL(p_g||\frac{p_{data}+p_{g}}{2})=-log4+2\cdot JSD(p_{data}||p_g)$
得證。

Convergence of Algorithm 1

Proposition 2. 只要 $G$ 和 $D$ 容量足夠，Algorithm 1總可以使得 $p_g$ 收斂於 $p_{data}$
Proof. 因爲 $p_g$ 關於 $V(G,D)$ 是個凸函數，所以一定收斂到最小值。

Experiments

效果當然比其他的方法好，這裏就不列出來了，其他的方法暫時也用不到。

Advantages and disadvantages

優點就是效果很好，缺點時不穩定，可能出現不收斂和崩潰的情況，還有沒法通過損失函數看出來訓練的情況，只能人工來看，在 nlp 這種離散的情況下，效果一般。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

【閱讀筆記】Generative Adversarial Nets

Abstract：

Introduction

Adversarial nets

Theoretical Results

Global Optimality of $p_g = p_{data}$

Convergence of Algorithm 1

Experiments

Advantages and disadvantages

釘釘打卡速度慢

Nginx R31 doc 官方文檔-01-nginx 如何安裝

Qt/C++音視頻開發74-合併標籤圖形/生成yolo運算結果圖形/文字和圖形合併成一個/水印濾鏡

挑戰程序設計競賽 2.2章習題 POJ - 3617 Best Cow Line 貪心

字節面試：MySQL什麼時候鎖表？如何防止鎖表？

.NET8連接SQL SERVER 2008 R2 報：證書鏈是由不受信任的頒發機構頒發的

golang開發環境搭建(win10)

python計算機視覺學習筆記——PIL庫的用法

Golang初學：獲取程序內存使用情況，std runtime

【論文閱讀】Solving Billion-Scale Knapsack Problems

【閱讀筆記】Cost-Effective and Stable Policy Optimization Algorithm for Uplift Modeling

【學術】重構具有時間延遲相互作用的動力學網絡

一元方程的求根公式

A holistic approach to semi-supervised learning

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結

【閱讀筆記】Generative Adversarial Nets

Abstract：

Introduction

Adversarial nets

Theoretical Results

Global Optimality of pg=pdatap_g = p_{data}pg​=pdata​

Convergence of Algorithm 1

Experiments

Advantages and disadvantages

Global Optimality of $p_g = p_{data}$