Thursday, December 05, 2019

Momentum

Momentum

2019/12/02

-----


// Overview of different Optimizers for neural networks

-----


// An Overview on Optimization Algorithms in Deep Learning 1 - Taihong Xiao

-----


# AdamW

-----


# Optimization 

-----

References

# Momentum
Sutskever, Ilya, et al. "On the importance of initialization and momentum in deep learning." International conference on machine learning. 2013.
http://proceedings.mlr.press/v28/sutskever13.pdf

# AdamW
Loshchilov, Ilya, and Frank Hutter. "Decoupled weight decay regularization (2019)." arXiv preprint arXiv:1711.05101.
https://arxiv.org/pdf/1711.05101.pdf

# Optimization 
Ruder, Sebastian. "An overview of gradient descent optimization algorithms." arXiv preprint arXiv:1609.04747 (2016).
https://arxiv.org/pdf/1609.04747.pdf

-----

Overview of different Optimizers for neural networks
https://medium.com/datadriveninvestor/overview-of-different-optimizers-for-neural-networks-e0ed119440c3

An Overview on Optimization Algorithms in Deep Learning 1 - Taihong Xiao
https://prinsphield.github.io/posts/2016/02/overview_opt_alg_deep_learning1/

Why Momentum Really Works
https://distill.pub/2017/momentum/

-----

从 SGD 到 Adam —— 深度学习优化算法概览(一) - 知乎
https://zhuanlan.zhihu.com/p/32626442

No comments: