The importance of discretisation drift in deep learning

Mihaela Rosca discussed with us in this session the difference between discrete steps that are taken by SGD when minimizing some function and continuous gradient flow. Bridging this difference results in exciting insights into the possible implicit regularizations that are taking place in the SGD optimization that allows it to get to such good generalizing solutions.

Presentation can be found here and it is being updated with new information!