Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks

This discussion was dedicated to the phenomenon of stochastic collapse in neural networks - based on the NeurIPS 2023 work “Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks”. Feng Chen, Daniel Kunin and Atsushi Yamamura presented their idea of simplification of neural networks that is happening during SGD training due to its biases. The attractive fields on the loss surface make the structure of the network smaller and the model generalizes better.

Presentation can be found here.