Momentum
2019/12/02
-----
// Overview of different Optimizers for neural networks
-----
// An Overview on Optimization Algorithms in Deep Learning 1 - Taihong Xiao
-----
# AdamW
-----
# Optimization
-----
References
# Momentum
Sutskever, Ilya, et al. "On the importance of initialization and momentum in deep learning." International conference on machine learning. 2013.
http://proceedings.mlr.press/v28/sutskever13.pdf# AdamW
Loshchilov, Ilya, and Frank Hutter. "Decoupled weight decay regularization (2019)." arXiv preprint arXiv:1711.05101.
# Optimization
Ruder, Sebastian. "An overview of gradient descent optimization algorithms." arXiv preprint arXiv:1609.04747 (2016).
https://arxiv.org/pdf/1609.04747.pdf -----
Overview of different Optimizers for neural networks
https://medium.com/datadriveninvestor/overview-of-different-optimizers-for-neural-networks-e0ed119440c3
An Overview on Optimization Algorithms in Deep Learning 1 - Taihong Xiao
https://prinsphield.github.io/posts/2016/02/overview_opt_alg_deep_learning1/
Why Momentum Really Works
https://distill.pub/2017/momentum/
-----
从 SGD 到 Adam —— 深度学习优化算法概览(一) - 知乎
https://zhuanlan.zhihu.com/p/32626442
No comments:
Post a Comment