Understanding edge of stability
In this meeting together with Xingyu Zhu we discussed edge of stability phenomenon and how it can be understood in a simplified minimalistic setup (see paper). We first got into the topic of the edge of stability (EoS) and how it is related to the sharpness of the loss surface in terms of the Hessian of the loss eigenvalues. Then Xingyu explained the 4D example that was considered in their recent paper published at ICLR23. The dynamics in the minimalistic example results in two conclusions: (i) there exists a range of initializations that will lead the training with a particular learning rate to the EoS configuration and (ii) there exist a range of learning rates that will result in EoS configuration. We further discussed the possible implications for the realistic models and promising future directions.
You can find the presentation that was held here.