Generative Adversarial Networks


1. Generative Adversarial Net (GAN) 2. Deep Convolutional GAN (DCGAN)

The Discriminator \(D\) and Generator \(G\) play a two-player minmax game with value function \(V(G,D)\):

$$\min_G\max_D V(G,D) = \mathbf{E}_{x\sim p_{data}(x)}[\log D(x)]+\mathbf{E}_{z\sim p(z)}[\log(1-D(G(z)))]$$
Given sample minibatch of \(m\) noise samples \(\{z^{(n)}\}_{i=1}^m\) from noise prior \(p_g(z)\), and sample minibatch of \(m\) examples \(\{x^{(n)}\}_{i=1}^m\) from data generating distribution \(p_{data}(x)\). The discriminator loss is given by,
$$\frac{1}{m}\sum^m_{i=1}\bigg[\log D(x^{(i)})+\log \big(1-D(G(z^{(i)}))\big)\bigg]$$
The generator loss is given by,
$$\frac{1}{m}\sum^m_{i=1}\log D(G(z^{(i)}))$$
Put \(\epsilon\) within each \(\log\) to avoid \(\log 0\).

3. Least Squares GAN (LSGAN)

The discriminator loss is given by,
$$\min_D V(D) = \frac{1}{2}\mathrm{E}_{x\sim p_{data}(x)}[(D(x)-b)^2]+\frac{1}{2}\mathrm{E}_{z\sim p_z(z)}[(D(G(z))-a)^2]$$
The generator loss is minimized via,
$$\min_G V(G) = \frac{1}{2}\mathrm{E}_{z\sim p_z(z)}[(D(G(z))-c)^2]$$
Hyperparameter options: 1. \(a=-1,b=1 \text{ and }c=0\), or 2. \(a=c=-1 \text{ and }b=0\).

4. Context-Conditional GAN (CCGAN)

Let \(\mathbf{m} \in \mathbb{R}^d\) denote to a binary mask that can be used to drop out a specified portion of an image. The generator receives as input \(\mathbf{m}\odot\mathbf{x}\) where \(\odot\) denotes element-wise multiplication. The generator outputs \(\mathbf{x_G}=G(\mathbf{m}\odot\mathbf{x},\mathbf{z})\in \mathbb{R}^d\) and the in-painted image \(\mathbf{x_I}\) is given by,
$$\mathbf{x_I}=(1-\mathbf{m})\odot\mathbf{x_G}+\mathbf{m}\odot\mathbf{x}$$
The CCGAN objective is given by,
$$\min_G\max_D \mathrm{E}_{\mathbf{x}\sim\mathcal{X}}[\log D(\mathbf{x})]+\mathrm{E}_{\mathbf{x}\sim\mathcal{X},\mathbf{m}\sim\mathcal{M}}[1-D(\mathbf{x_I})]$$

5. Auxiliary Classifier GAN (ACGAN)

Another formulation: The generator \(G\) takes an input a random noise vector \(z\) and outputs an image \(X_{fake}=G(z)\). The discriminator \(D\) receives as input either a training image or a synthesized image from the generator and outputs a probability distribution \(P(S|X) = D(X)\) over possible image sources. The discriminator is trained to maximize the log-likelihood it assigns to the correct source:
$$L = E[\log P(S=real|X_{real})]+E[\log P(S=fake|X_{fake})]$$
For ACGAN, every generated sample has a corresponding class label \(c\sim p_c\) in addition to the noise \(z\). \(G\) uses both to generate images \(X_{fake}=G(c,z)\). The discriminator gives both a probability distribution over sources and a probability distribution over the class labels, \(P(S|X),P(C|X) = D(X)\). THe objective function has two parts: the log-likelihood of the correct source \(L_S\) and the log-likelihood of the correct class \(L_C\).
$$L_S = E[\log P(S=real|X_{real})]+E[\log P(S=fake|X_{fake})]$$ $$L_C = E[\log P(C=c|X_{real})]+E[\log P(C=c|X_{fake})]$$
\(D\) is trained to maximize \(L_S+L_C\) while \(G\) is trained on maximize \(L_C-L_S\). ACGAN learns a representation for \(z\) that is independent of class label.

6. InfoGAN


7. Adversarial Autoencoder


8. Image-to-Image Translation (Pix2Pix)

The objective of a conditional GAN can be expressed as,
$$\mathcal{L}_{cGAN}(G,D) = \mathrm{E}_{x,y}[\log D(x,y)]+\mathrm{E}_{x,z}[\log(1-D(x,G(x,z)))]$$
to discriminate whether the condition image is paired with the correct image. L1 loss encourages less blurring,
$$\mathcal{L}_{L1}(G) = \mathrm{E}_{x,y,z}[\|y-G(x,z)\|_1]$$
The final objective is given by,
$$G^* = \arg\min_G\max_D \mathcal{L}_{cGAN}(G,D)+\lambda\mathcal{L}_{L1}(G)$$

9. Cycle-Consistent Adversarial Network (Cycle-GAN)

The model contains two mapping functions (generators) \(G:X\rightarrow Y\) and \(F:Y\rightarrow X\), and associated adversarial discriminators \(D_Y\) and \(D_X\). \(D_Y\) encourages \(G\) to translate \(X\) into outputs indistinguishable from domain \(Y\), and vice versa for \(D_X\) and \(F\). For the mapping function \(G:X\rightarrow Y\) and its discriminator \(D_Y\), the objective is expressed as,
$$\mathcal{L}_{GAN}(G,D_Y,X,Y)=\mathrm{E}_{y\sim p_{data}(y)}[\log D_Y(y)]+\mathrm{E}_{x\sim p_{data}(x)}[\log(1-D_Y(G(x)))]$$
The cycle consistency loss is given by,
$$\mathcal{L}_{cyc}(G,F) = \mathrm{E}_{x\sim p_{data}(x)}[\|F(G(x))-x\|_1]+\mathrm{E}_{y\sim p_{data}(y)}[\|G(F(y))-y\|_1]$$
The full objective is given by,
$$\mathcal{L}(G,F,D_X,D_Y)=\mathcal{L}_{GAN}(G,D_Y,X,Y)+\mathcal{L}_{GAN}(F,D_X,Y,X)+\lambda\mathcal{L}_{cyc}(G,F)$$
Aim to solve,
$$G^*,F^* = \arg\min_{G,F}\max_{D_X,D_Y}\mathcal{L}(G,F,D_X,D_Y)$$

References