Conditional Inpainting — Generating colored interior base on image contour

- 3 mins

In this post I detail my first objective/step with regards to the IFT6266 project, being able to generate colors based on the input (contour of image).

Expanding on what was covered so far in class on generative models (not much). I understand the generative part is achieved simply by producing an output and minimize its difference with the target output.

Just like a regular neural net, that difference/loss will be passed through the different parameters and hopefully it will get closer as it trains more and more.

As mentioned in my plan for the project, I will not include any of the captions at this stage.

Dealing with the black square

My initial thought to deal with the black 32x32 square in the middle of the input image, is just to leave it as is and proceed with regular convolution. It will be interesting to see if it really performs and affects the training time.

The model should, in theory, be able to realize that it doesn’t provide any help at all to consider the blacked out portion of the image.

Initial configuration

As a first model with a goal of just outputting something that makes the slightest sense, I considered a CNN encoder and then a CNN decoder. The structure of the network is at follows (channels, nb pixels, nb pixels)

To summarize the above network, it encodes the contour into a 32 x 1 x 1 input, and then applies 2 full hidden layers of 256 units. It is then decoded from a reshaped 64 x 4 x 4 into a 3 x 32 x 32 that matches our desired output shape.

A sigmoid is applied to the last transposed convolution layer. This output is then scaled by 255 to be in the appropriate range for generating images.


As a way to evaluate performance, images from the valid set are generated using the true contour of the image and the output of the network. Below is the evolution the predicted image from a random set over the training epochs, with the corresponding true image.


N.B. Due to technical issues, I lost the original images and wasn’t able to regenerate them/find them..I was however able to recover one of the gif. (update – March 29, 2017)

We can notice that the model does learn some different shades of color and is able to fill the center of the image with the right shade/intensity and colors.

Next step

Some possible improvements/modifications. First, increase the number of features as the encoding happens, to end up with more inputs to the fully connected layers and decoder. Second, the last layer shouldn’t be an up sample operation (thanks Francis link to his blog) as it reduces the output resolutions to being blocs of the same color!

rss facebook twitter github gitlab youtube mail spotify lastfm instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora quora