Jeremy and Laker discussed with us their perspective on the optimizers used in modern DL as the same first order optimizer under different norm considered in the dual formula. This is based on the works “Old Optimizer, New Norm: An Anthology” and “Modular Duality in Deep Learning”. Such perspective unifies existing optimizers and make selection of a suitable for the current training more rigorous.

Presentation can be found here.