Optimization
2020/10/12
-----
https://pixabay.com/zh/photos/stopwatch-gears-work-working-time-3699314/
-----
https://www.neuraldesigner.com/blog/5_algorithms_to_train_a_neural_network
-----
https://blog.slinuxer.com/2016/09/sgd-comparison
http://www.stat.cmu.edu/~ryantibs/convexopt-F18/lectures/quasi-newton.pdf
-----
https://en.wikipedia.org/wiki/Quasi-Newton_method
-----
-----
-----
References
◎ 大框架
5 algorithms to train a neural network
https://www.neuraldesigner.com/blog/5_algorithms_to_train_a_neural_network
◎ 一、SGD
SGD算法比较 – Slinuxer
https://blog.slinuxer.com/2016/09/sgd-comparison
An overview of gradient descent optimization algorithms
https://ruder.io/optimizing-gradient-descent/
从 SGD 到 Adam —— 深度学习优化算法概览(一) - 知乎
https://zhuanlan.zhihu.com/p/32626442
◎ 二、牛頓法與高斯牛頓法
(57) Gauss-Newton algorithm for solving non linear least squares explained - YouTube
https://www.youtube.com/watch?v=CjrRFbQwKLA
4.3 Newton's Method
https://jermwatt.github.io/machine_learning_refined/notes/4_Second_order_methods/4_4_Newtons.html
Hessian Matrix vs. Gauss-Newton Hessian Matrix | Semantic Scholar
牛顿法 高斯牛顿法 | Cheng Wei's Blog
https://scm_mos.gitlab.io/algorithm/newton-and-gauss-newton/
◎ 三、共軛梯度法
Deep Learning Book
https://www.deeplearningbook.org/contents/optimization.html
Blog - Conjugate Gradient 1 | Pattarawat Chormai
https://pat.chormai.org/blog/2020-conjugate-gradient-1
linear algebra - Why is the conjugate direction better than the negative of gradient, when minimizing a function - Mathematics Stack Exchange
◎ 四、擬牛頓法
quasi-newton.pdf
http://www.stat.cmu.edu/~ryantibs/convexopt-F18/lectures/quasi-newton.pdf
Quasi-Newton method - Wikipedia
https://en.wikipedia.org/wiki/Quasi-Newton_method
# 很強的架構
梯度下降法、牛顿法和拟牛顿法 - 知乎
https://zhuanlan.zhihu.com/p/37524275
◎ 五、萊文貝格-馬夸特方法
Optimization for Least Square Problems
https://zlthinker.github.io/optimization-for-least-square-problem
萊文貝格-馬夸特方法 - 維基百科,自由的百科全書
◎ 六、自然梯度法
◎ 七、K-FAC
◎ 八、Shampoo
-----
No comments:
Post a Comment