2019/01/17
-----
// How do we decide the optimizer used for training - Part 1 (2017) - Deep Learning Course Forums
-----
// SGD算法比较 – Slinuxer
-----
References
# Optimization
Ruder, Sebastian. "An overview of gradient descent optimization algorithms." arXiv preprint arXiv:1609.04747 (2016).
Bottou, Léon, Frank E. Curtis, and Jorge Nocedal. "Optimization methods for large-scale machine learning." SIAM Review 60.2 (2018): 223-311.
https://leon.bottou.org/publications/pdf/tr-optml-2016.pdf -----
An overview of gradient descent optimization algorithms
https://ruder.io/optimizing-gradient-descent/
How do we decide the optimizer used for training - Part 1 (2017) - Deep Learning Course Forums
https://forums.fast.ai/t/how-do-we-decide-the-optimizer-used-for-training/1829/6
-----
SGD算法比较 – Slinuxer
https://blog.slinuxer.com/2016/09/sgd-comparison
梯度下降法家族 _ 机器不太会学习
http://sakigami-yang.me/2017/12/23/GD-Series/
6个派生优化器的简单介绍及其实现 - 科学空间_Scientific Spaces
https://kexue.fm/archives/7094
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.