Sign in

Parle: Parallelizing Stochastic Gradient Descent

Deep learning has been very been very successfully applied in many areas, such as Image Classification and Natural Language Processing. And this encourages more massive dataset to emerge in recent years. To tackle problems utilizing these scale of dataset, distributed and parallel training of deep learning has been researched and proved to have state of art performances.

Distributed and parallel training mainly composed by model parallelization and data parallelization. It is not perfect and instead comes with inevitable drawbacks, for example, communication overhead. Here I’ll introduce a Parle, a new algorithm solution, proposed by Chaudhari et al. in 2017.


Xin Gu

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store