My first Introduction with GAN ( Generative Adversarial Networks )
"it’s the coolest idea of Machine Learning in the last 20 years" - Yann LeCun (one of the fathers of Deep Learning)
GANs or Generative Adversarial Networks are a kind of neural networks that is composed of 2 separate deep neural networks competing each other: the generator and the discriminator.
Their goal is to generate data points that are similar to some of the data points in the training set.
Here is the original GAN paper by @goodfellow_ian:
https://arxiv.org/pdf/1406.2661.pdf
Let's take a theoretical example of the process of money counterfeiting. In this process, we can imagine two types agents: a criminal and cop. Let us look into their competing objectives:
Criminal'sObjective: The main objective of the criminal is to come up with complex ways of counterfeiting money such that the Cop cannot distinguish between counterfeited money and real money.
Cop's Objective: The main objective of the cop is to come up with complex ways so as to distinguish between counterfeited money and real money.
As this process progresses the cop develops more and more sophisticated technology to detect money counterfeiting and criminal develops more and more sophisticated technology to counterfeit money. This is the basis of what is called an Adversarial Process.
Idea of GAN:
The basic idea behind GANs is actually very simple. At its core, a GAN includes two agents with competing objectives that work through opposing goals.
This relatively simple setup results in both of the agent's coming up with increasingly complex ways to deceive each other. This kind of situation can be modeled in Game Theory as a minimax game.
"The generator will try to generate fake images that fool the discriminator into thinking that they’re real. And the discriminator will try to distinguish between a real and a generated image as best as it could when an image is fed."
They both get stronger together until the discriminator cannot distinguish between the real and the generated images any more.
Generative Adversarial Networks take advantage of Adversarial Processes to train two Neural Networks who compete with each other until a desirable equilibrium is reached. In this case, we have a Generator Network G(Z) which takes input random noise and tries to generate data very close to the dataset we have. The other network is called the Discriminator Network D(X) which takes input generated data and tries to discriminate between generated data and real data.
This network at its core implements a binary classification and outputs the probability that the input data actually comes from the real dataset (as opposed to the synthetic, or fake data).
GAN Implementation using tensorflow:
By the definition of GAN, we need two nets. This could be anything, be it a sophisticated net like convnet or just a two layer neural net. Let’s be simple first and use a two layer nets for both of them. We’ll use TensorFlow for this purpose.
#Discrimenator Net
def discriminator(x):
D_h1 = tf.nn.relu(tf.matmul(x, D_W1) + D_b1)
out = tf.matmul(D_h1, D_W2) + D_b2
return out
Above, generator(z) takes 100-dimensional vector and returns 786-dimensional vector, which is MNIST image (28x28).
The discriminator(x) takes MNIST image(s) and return a scalar which represents a probability of real MNIST image
Now, let’s declare the Adversarial Process for training this GAN. Here’s the training algorithm from the paper:
Above, we use negative sign for the loss functions because they need to be maximized, whereas TensorFlow’s optimizer can only do minimization.
Also, as per the paper’s suggestion, it’s better to maximize tf.reduce_mean(tf.log(D_fake)) instead of minimizing tf.reduce_mean(1 - tf.log(D_fake)) in the algorithm above.
Then we train the networks one by one with those Adversarial Training, represented by those loss functions above.
# Only update D(X)'s parameters, so var_list = theta_D
D_solver = tf.train.AdamOptimizer().minimize(D_loss, var_list=theta_D)
# Only update G(X)'s parameters, so var_list = theta_G
G_solver = tf.train.AdamOptimizer().minimize(G_loss, var_list=theta_G)
Comments