Generative Adversarial Networks

1. Generative Adversarial Net (GAN) 2. Deep Convolutional GAN (DCGAN)

The Discriminator $D$ and Generator $G$ play a two-player minmax game with value function $V(G,D)$:

$$\min_G\max_D V(G,D) = \mathbf{E}_{x\sim p_{data}(x)}[\log D(x)]+\mathbf{E}_{z\sim p(z)}[\log(1-D(G(z)))]$$

Given sample minibatch of $m$ noise samples $\{z^{(n)}\}_{i=1}^m$ from noise prior $p_g(z)$, and sample minibatch of $m$ examples $\{x^{(n)}\}_{i=1}^m$ from data generating distribution $p_{data}(x)$. The discriminator loss is given by,

$$\frac{1}{m}\sum^m_{i=1}\bigg[\log D(x^{(i)})+\log \big(1-D(G(z^{(i)}))\big)\bigg]$$

The generator loss is given by,

$$\frac{1}{m}\sum^m_{i=1}\log D(G(z^{(i)}))$$

Put $\epsilon$ within each $\log$ to avoid $\log 0$.

3. Least Squares GAN (LSGAN)

The discriminator loss is given by,

$$\min_D V(D) = \frac{1}{2}\mathrm{E}_{x\sim p_{data}(x)}[(D(x)-b)^2]+\frac{1}{2}\mathrm{E}_{z\sim p_z(z)}[(D(G(z))-a)^2]$$

The generator loss is minimized via,

$$\min_G V(G) = \frac{1}{2}\mathrm{E}_{z\sim p_z(z)}[(D(G(z))-c)^2]$$

Hyperparameter options: 1. $a=-1,b=1 \text{ and }c=0$, or 2. $a=c=-1 \text{ and }b=0$.

4. Context-Conditional GAN (CCGAN)

Let $\mathbf{m} \in \mathbb{R}^d$ denote to a binary mask that can be used to drop out a specified portion of an image. The generator receives as input $\mathbf{m}\odot\mathbf{x}$ where $\odot$ denotes element-wise multiplication. The generator outputs $\mathbf{x_G}=G(\mathbf{m}\odot\mathbf{x},\mathbf{z})\in \mathbb{R}^d$ and the in-painted image $\mathbf{x_I}$ is given by,

$$\mathbf{x_I}=(1-\mathbf{m})\odot\mathbf{x_G}+\mathbf{m}\odot\mathbf{x}$$

The CCGAN objective is given by,

$$\min_G\max_D \mathrm{E}_{\mathbf{x}\sim\mathcal{X}}[\log D(\mathbf{x})]+\mathrm{E}_{\mathbf{x}\sim\mathcal{X},\mathbf{m}\sim\mathcal{M}}[1-D(\mathbf{x_I})]$$

5. Auxiliary Classifier GAN (ACGAN)

Another formulation: The generator $G$ takes an input a random noise vector $z$ and outputs an image $X_{fake}=G(z)$. The discriminator $D$ receives as input either a training image or a synthesized image from the generator and outputs a probability distribution $P(S|X) = D(X)$ over possible image sources. The discriminator is trained to maximize the log-likelihood it assigns to the correct source:

$$L = E[\log P(S=real|X_{real})]+E[\log P(S=fake|X_{fake})]$$

For ACGAN, every generated sample has a corresponding class label $c\sim p_c$ in addition to the noise $z$. $G$ uses both to generate images $X_{fake}=G(c,z)$. The discriminator gives both a probability distribution over sources and a probability distribution over the class labels, $P(S|X),P(C|X) = D(X)$. THe objective function has two parts: the log-likelihood of the correct source $L_S$ and the log-likelihood of the correct class $L_C$.

$$L_S = E[\log P(S=real|X_{real})]+E[\log P(S=fake|X_{fake})]$$ $$L_C = E[\log P(C=c|X_{real})]+E[\log P(C=c|X_{fake})]$$

$D$ is trained to maximize $L_S+L_C$ while $G$ is trained on maximize $L_C-L_S$. ACGAN learns a representation for $z$ that is independent of class label.

6. InfoGAN

7. Adversarial Autoencoder

8. Image-to-Image Translation (Pix2Pix)

The objective of a conditional GAN can be expressed as,

$$\mathcal{L}_{cGAN}(G,D) = \mathrm{E}_{x,y}[\log D(x,y)]+\mathrm{E}_{x,z}[\log(1-D(x,G(x,z)))]$$

to discriminate whether the condition image is paired with the correct image. L1 loss encourages less blurring,

$$\mathcal{L}_{L1}(G) = \mathrm{E}_{x,y,z}[\|y-G(x,z)\|_1]$$

The final objective is given by,

$$G^* = \arg\min_G\max_D \mathcal{L}_{cGAN}(G,D)+\lambda\mathcal{L}_{L1}(G)$$

9. Cycle-Consistent Adversarial Network (Cycle-GAN)

The model contains two mapping functions (generators) $G:X\rightarrow Y$ and $F:Y\rightarrow X$, and associated adversarial discriminators $D_Y$ and $D_X$. $D_Y$ encourages $G$ to translate $X$ into outputs indistinguishable from domain $Y$, and vice versa for $D_X$ and $F$. For the mapping function $G:X\rightarrow Y$ and its discriminator $D_Y$, the objective is expressed as,

$$\mathcal{L}_{GAN}(G,D_Y,X,Y)=\mathrm{E}_{y\sim p_{data}(y)}[\log D_Y(y)]+\mathrm{E}_{x\sim p_{data}(x)}[\log(1-D_Y(G(x)))]$$

The cycle consistency loss is given by,

$$\mathcal{L}_{cyc}(G,F) = \mathrm{E}_{x\sim p_{data}(x)}[\|F(G(x))-x\|_1]+\mathrm{E}_{y\sim p_{data}(y)}[\|G(F(y))-y\|_1]$$

The full objective is given by,

$$\mathcal{L}(G,F,D_X,D_Y)=\mathcal{L}_{GAN}(G,D_Y,X,Y)+\mathcal{L}_{GAN}(F,D_X,Y,X)+\lambda\mathcal{L}_{cyc}(G,F)$$

Aim to solve,

$$G^*,F^* = \arg\min_{G,F}\max_{D_X,D_Y}\mathcal{L}(G,F,D_X,D_Y)$$