Semantic Segmentation Transfer- 3 mins
This course project was done in the context of the Autonomous Vehicles (Duckietown) course given at UdeM during Fall 2018.
This post only provides an overview of the approach used throughout the project. The final slides used in our class presentation can be found here.
The GitHub repository with all the source code, additional information and pre-trained models to run the simulator predictions with the model can be found at this link.
Broadly speaking, the goal of my team’s project was to train a segmentation model that would be able to detect lines on the road in a simulated environment and transfer that knowledge to real-world videos.
Duckietown is a simplified world of roads where many characteristics are simplified to allow researchers and students to filter out noise and focus on meaningful autonomous vehicles research. The organizers of the Duckietown Project created this repo where, given access to the Duckietown map and a duckiebot, one can download and easily run a baseline in a simplified real-world environment.
As part of the current baseline, the system’s line segmentation algorithm detects the lines (white, yellow, red) from the camera feed using traditional computer vision methods such as measuring the difference in pixel intensity and exploiting the range of colors. While this approach works well in a controlled environment, it is very sensitive to lighting conditions and requires fine tuning to ensure good generalization.
Also available to students and researchers as part of the Duckietown universe is a simulator called Gym-Duckietown of the real-world map with which we can interact and receive the pixel feed.
To enhance the generalization performance of the line segmentation algorithm, we proposed to train a line segmentation model in the simulator by leveraging the infinite data available for training. Indeed, training a pixel wise segmentation model on real-world images would be highly ineffective due to the large cost of labelling images with the ground truth lines.
In the simulator, we can control the game engine generating the images, which means we have access to where the actual lines are. This, in turn, allows us to generate a camera feed automatically with the lines perfectly segmented. The goal is therefore to train a segmentation model in that simulated environment and then transfer that knowledge to real-world camera feeds of duckiebots.
Below is an example of the segmented lines generated directly from the simulator.
More information on the data generated and the analysis of the resulting dataset to train the segmentation model, see this section in the repository.
Transferring to real camera feed
To be able to segment the real camera feed, using a GAN framework, we use a generator to convert real world images into the space of the simulator. The intent here is, given a new camera feed, be able to convert it into the simulator’s style to allow for our segmentation model to be effective, where the latter was trained in the simulator’s style.
In a sense, we are trying to remove everything from the true images that is non-existent in the simulated images. We hope that this in turn allows for good generalization performance on unseen data during training.
To achieve this, we iterate between training the generator/discriminator and the segmentation model.
Below are some samples of results, generated on a video the model wasn’t trained on. On the left side, you will notice the raw camera feed of the duckiebot, in the middle, the camera feed converted into the simulator space and on the right, the segmentation predictions on the converted images.
These videos are very interesting as they allow for interpretability of the model’s prediction. We can understand what the segmentation model sees in the simulator space!