Master’s thesis- 1 min
This thesis presents my research during the completion of my Master’s degree at the Université de Montréal at Mila under the supervision of Professor Yoshua Bengio. This work was done with the collaboration of postdoctoral researcher Min Lin at Mila.
This document is structured in such a way to introduce the machine learning basics in the first chapter to be able to follow the work detailed in the subsequent ones. Chapters 2, 3 and 4 detail different experiments to study how useful having a communication channel between deep learning models can be.
I include below the full summary for convenience. The full document can also be found at this link.
As artificial intelligence systems spread to more diverse and larger tasks in many domains, the machine learning algorithms, and in particular the deep learning models and the databases required to train them are getting bigger themselves. Some algorithms do allow for some scaling of large computations by leveraging data parallelism. However, they often require a large amount of data to be exchanged in order to ensure the shared knowledge throughout the compute nodes is accurate.
In this work, the effect of different levels of communications between deep learning models is studied, in particular how it affects performance. The first approach studied looks at decentralizing the numerous computations that are done in parallel in training procedures such as synchronous and asynchronous stochastic gradient descent. In this setting, a simplified communication that consists of exchanging low bandwidth outputs between compute nodes can be beneficial. In the following chapter, the communication protocol is slightly modified to further include training instructions. Indeed, this is studied in a simplified setup where a pre-trained model, analogous to a teacher, can customize a randomly initialized model’s training procedure to accelerate learning. Finally, a communication channel where two deep learning models can exchange a purposefully crafted language is explored while allowing for different ways of optimizing that language.