【閱讀筆記】Improved Training of Wasserstein GANs

原創

2019-02-27 19:20

Improved Training of Wasserstein GANs

Gulrajani I, Ahmed F, Arjovsky M, et al. Improved training of wasserstein gans[C]//Advances in Neural Information Processing Systems. 2017: 5767-5777.
GitHub: https://github.com/igul222/improved_wgan_training

Abstract

GAN雖然是個強有力的生成模型，但是訓練不穩定的缺點影響它的使用。剛剛提出的 Wasserstein GAN (WGAN) 使得 GAN 的訓練變得穩定，但是有時也會產生很差的樣本和不收斂。我們發現這些問題的原因常常是因爲 weight clipping 來滿足判別器（critic，os.坑，研究了半天才領會這個意思）的 Lipschitz constraint。我們把 weight clipping 轉化爲成判別器的梯度範數關於輸入的懲罰。我們的方法優於 standard WGAN 和大部分的 GAN 的變種。

Introduction

Generative adversarial networks

Formally, the game between the generator G and the discriminator D is the minimax objective:
$min_Gmax_DE_{x\sim p_r}[logD(x)]+E_{\hat{x}\sim p_g}[log(1-D(\hat{x}))]$

In practice, the generator is instead trained to maximize $E_{\hat{x}\sim p_g}[log(D(\hat{x}))]$ 。因爲這樣可以規避當判別器飽和時的梯度消失。

Wasserstein GANs

The WGAN value function is constructed using the Kantorovich-Rubinstein duality to obtain
$min_Gmax_{D\in\mathscr{D}}E_{x\sim p_r}[D(x)]-E_{\hat{x}\sim p_g}[D(\hat{x})]$

其中 $\mathscr{D}$ 是 1-Lipschitz functions。爲了使判別器滿足 k-Lipschitz 限制，需要將權重固定在 $[-c,c]$ ，k是由 $c$ 和模型結構所決定。

Difficulties with weight constraints

如下圖所示，發現進行 weight clipping 有兩個特點，一是會使得權重集中在所設範圍的兩端，二是會很容易造成梯度爆炸或梯度消失。這是因爲判別器要滿足 Lipschitz 條件，但是判別器的目標是使得真假樣本判別時差別越大越好，經過訓練後，權值的絕對值就集中在最大值附近了。

Gradient penalty

Algorithm 1 WGAN with gradient penalty. We use default values of $\lambda=10$ , $n_{critic}=5$ , $KaTeX parse error: Expected 'EOF', got '\apha' at position 1: \̲a̲p̲h̲a̲=0.0001$ , $\beta_1=0$ , $\beta_2=0.9$ .
Require: The gradient penalty coefficient $\lambda$ , the number of critic iterations per generator iteration $n_critic$ , the batch size $m$ , Adam hyperparameters $\alpha,\beta_1,\beta_2$ .
Require: initial critic parameters $w_0$ , initial generator parameters $\theta_0$ .

while $\theta$ has not converged do

for $t=1, ..., n_{critic}$ do

for $i = 1, ..., m$ do

Sample real data $x\sim P_r$ , latent variable $z\sim p(z)$ , a random number $\epsilon\sim U[0, 1]$ .

$\tilde{x}\leftarrow G_{\theta}(z)$

$\hat{x}\leftarrow\epsilon x + (1 −\epsilon)\hat{x}$

$L^{(i)}\leftarrow D_w(x) − D_w(\tilde{x}) + \lambda(||\nabla_{\hat{x}}D_w(\hat{x})||_2-1)^2$

end for

$w\leftarrow Adam(\nabla_w\frac{1}{m}\sum_{i=1}^mL^(i), w, \alpha, \beta_1, \beta_2)$

end for

Sample a batch of latent variables $\{z^{(i)}\}^m_{i=1}\sim p(z)$ .

$\theta\leftarrow Adam(\nabla_{\theta}\frac{1}{m}\sum_{I=1}^m−D_w(G_{theta}(z)), θ, \alpha, \beta_1, \beta_2)$

end while

WGAN-GP 的創新點在與優化了代價函數
$L= E_{x\sim p_r}D_w(x) − E_{x\sim p_g}[D_w(\tilde{x})] + \lambda E_{\hat{x}\sim p_{\hat{x}}}[||\nabla_{\hat{x}}D_w(\hat{x})||_2-1)^2]$

對權重增加懲罰項，使得在原始數據和生成數據中間地帶的權重的儘量小，相當於把 WGAN 的硬閾值轉化爲了軟閾值。

Experiments

Conclusion

從實驗上來看效果好於其他 GAN 方法，但是看其他資料說不一定好於WGAN，以後有空實驗一下看看效果。

發表評論

所有評論

還沒有人評論，想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.

【閱讀筆記】Improved Training of Wasserstein GANs

Improved Training of Wasserstein GANs

Abstract

Introduction

Generative adversarial networks

Wasserstein GANs

Difficulties with weight constraints

Gradient penalty

Experiments

Conclusion

釘釘打卡速度慢

Nginx R31 doc 官方文檔-01-nginx 如何安裝

Python 潮流週刊#51：用 Python 繪製美觀的圖表

Qt/C++音視頻開發74-合併標籤圖形/生成yolo運算結果圖形/文字和圖形合併成一個/水印濾鏡

挑戰程序設計競賽 2.2章習題 POJ - 3617 Best Cow Line 貪心

字節面試：MySQL什麼時候鎖表？如何防止鎖表？

.NET8連接SQL SERVER 2008 R2 報：證書鏈是由不受信任的頒發機構頒發的

golang開發環境搭建(win10)

python計算機視覺學習筆記——PIL庫的用法

Golang初學：獲取程序內存使用情況，std runtime

【論文閱讀】Solving Billion-Scale Knapsack Problems

【閱讀筆記】Cost-Effective and Stable Policy Optimization Algorithm for Uplift Modeling

【學術】重構具有時間延遲相互作用的動力學網絡

一元方程的求根公式

A holistic approach to semi-supervised learning

https://yachay.unat.edu.pe/blog/index.php?comment_area=format_blog&comment_component=blog&comment_co

linux以太網驅動總結