Conditional Inpainting — Generative Adversarial Network pretraining- 4 mins
Relating to my IFT6266 course project, I detail the implementation of a Deep Convolutional Generative Adversarial Network (DCGAN), where I hope to get a strong model that understands (or at least gets close to) the distribution behind the real images. The goal as detailed in this post, is to have a competent model in order to be able to use it effectively for reconstructing images.
The objective is to have a good model, both the discriminator and generator, where if they are good enough, we could use to figure out what would be the best reconstructed image possible from a corrupted source (such as the project). More details about the approach explored can be found in this article by Yeh et al.
At this stage though, the focus is just to get a pre-trained Generative Adversarial Network (GAN). However, this task is not to be taken lightly…
Implementation and motivation
As a quick refresh of GANs, it is a structure where two networks have competing goals.
Discriminator – This network takes an image as input and outputs a score that reflects its confidence that it is a real image. Its parameters are tuned to have a high score when a real image is inputed, and a low score when a fake images from the generator is inputed.
Generator – This network takes random noise as input and outputs an image. Its parameters are tuned to get a high score from the discriminator on fake images that it generates, therefore trying to fool the discriminator.
These two competing goals gave the reputation of GANs to be difficult to train. This comes from the fact that these networks are trained mutually and balancing the discriminator and generator’s training can be tricky. The ultimate goal would be to have a very prolific generator that causes the discriminator to be 50/50 on being either a real or fake image.
The Deep Convolutional part of the GAN is simply the structure of the networks that are convolutional networks. The DCGAN that I implemented was highly inspired by the original implementation by Radford et al.
If you are interesting, you can find my full implementation of this model here.
Training the networks
As briefly mentioned, training can be tricky for these types of networks. After experimenting for a while, I ended up alternating between 30 batches for the discriminator and 10 for the generator. I felt like it provided enough time for both models to get a good version of their opponent.
One could argue that alternative training would end up to converge also, but for now I have mostly focused on multiple steps for each network and showing promising results.
The following figure shows the 100 running average loss for both the discriminator and generator on the training dataset. The networks were trained over 37 epochs.
It can be noticed from the figure that while the discriminator continues to have a lower loss, the generator’s starts to increase over time. The fact that the discriminator is getting and better, should allow for the generator to get better and better, as it now has a more solid ground on which to fool. At least, that’s the way I like to understand it.
That is why analyzing only the loss over time can corrupt our opinion on the efficiency of the networks.
Since the ultimate goal is to get good images out of the generator, it is important to track the images that are being produced over the training period. To do so, after each epoch some random images are generated out of pure random noise.
Below are rows of randomly generated images at different epochs.
The first row are images generated after the first epoch, the second after 5, the third after 15, the fourth after 30 and the last after 35.
Results to me are very promising. These images look like art work from a mad man – sure, but when looking at the images generated after the last epoch you can start imagining that it looks like some known objects. It looks like some structure is present in the pictures. In particular, the fourth row and first column image, we can denote people.
This leads me to conclude that longer training time could evolve into more detailed pictures.
Following these results, I will continue to train this model for a longer period and monitor the quality of the images. If the performance is satisfactory, I can start moving towards the next step which is using the generator to reconstruct a corrupted image.