Predicting grokking long before it happens
This time we had a discussion about grokking, which is a phenomenon of finding a good generalizing minimum long after finding training minimum. It can be reflecting the loss surface structure properties and therefore is interesting for investigation. Pascal Jr. Tikeng Notsawo presented the results from the paper accepted at workshop on Neural Scaling Laws: Emergence and Phase Transitions from ICML2023 “Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok”.
Presentation can be found here.