SGD with Large Step Sizes Learns Sparse Features
We discussed the simplicity bias of SGD with large step sizes in this meeting with Maksym Andriushchenko. The presentation was based on the ICML2023 paper “SGD with Large Step Sizes Learns Sparse Features”. In particular, we discussed the models for which the sparsity of activations can be proven rigorously, and how this can be extended to real-world models and datasets. The research was put into the perspective of the similar works that show simplicity of the models obtained with SGD.
You can find the presentation that was held here.