In this meeting together with Ping-yeh Chiang we discussed if SGD and SGD based methods really have some implicit biases allowing to find good generalizing minima or is it rather a property of loss surface (see paper). We had a lively discussion around the empirical evidence that were demonstrated in the paper. In particular, it was noted that running similar experiments on models with residual connections can be an interesting work for further insights; understanding the precise definition of the volume of solutions as a result of the applied methods can be useful, as well as understanding how different are the obtained solutions.

You can find the presentation that was held here.