The Star Also Rises: January 2019

AI 從頭學（二一）：A Glance at Deep Reinforcement Learning

2017/04/06

前言：

資料很多，先理出個頭緒以便按圖索驥。

-----

Summary：

Deep learning (DL) [1]-[40] + reinforcement l earning (RL) [41]-[44] = Deep reinforcement learning (DRL) [45]-[53]。

這四個多月以來，我利用下班時間自學一點 DL [1]-[20]。嚴格說來，不能算是完全自學，因為社團內的先進很熱心，陸陸續續受到 Jason、Winston 等先進的指導，所以還算蠻順利的。最近因為 Jason 除了原來 DL 的讀書會之外 [29]，新增了 RL 的讀書會 [41]，而我也獲得新的 DRL 資訊 [20]，所以藉這個機會，重新擬定一下自己的讀書計畫，參考圖1.1。

一開始，我查了一些回顧性質的論文 [1]，隨著 S/W tools 的貼文 [2]，引來一連串的討論 [3]-[8]，而我也開始從 BP [9]、AD [10]、LeNet [12], [13]、CNN [18]、RNN [19] 一路看下來。與此同時，我還是不斷地問自己，DL 是什麼 [1], [11], [17], [20]？

整理完 DRL 的參考資料，大約可說稍有概念了！

Fig. 1.1. Deep reinforcement learning.

-----

Introduction

AI 要如何定義，圖1.2告訴我們，由淺入深是 ML、RL、DL。圖1.3則是把相關技術分為 Deep、Shallow，Supervised、Unsupervised。圖 1.4則是舉出另外還有 Reinforcement Learning。結合 DL 與 RL，目前 DRL 是一重要的發展方向。

到目前為止，我大概可以對 DL 下個不算精確的定義隨時提醒自己：DL 為一連續可微的函數表現為多層的網路，其輸入與輸出均為向量。每層前半為線性，後半段為非線性。f(x, w, b) = y，訓練的目的在於找出 w、b 值使得所有 x 輸入的誤差很小。其方法是 BP，對某個 x 輸入固定時，w、b 為變數，沿著讓 cost function 的偏導數變小的方向調整 w、b，則可讓誤差函數慢慢收斂。

以上這段是寫給自己的，不懂的人看不懂，懂得的人不用看。

以下共分六部分，A、B、C 為基礎到進階的 DL。D 為 RL，E、F 為 DRL，參考圖1.1，數字為我幫自己設計的課程順序。本文主要以解釋順序為主。

Fig. 1.2. AI, p. 9 [29].

Fig. 1.3. Deep learning taxonomy, p. 492 [28].

Fig. 1.4. Machine learning, p. 515 [28].

-----

A, 01-07

Part A 屬於 DL 的基礎部分，有關這個，我特別想推薦 [28]，這本書有電子版可以下載。它寫的不算很好，有的地方太簡略。但好處是 DL 就一章，RL 也一章，你不用在一開始就去啃 [29]，即使我現在對於CNN [18] 與 RNN [19] 已經有一點瞭解，[29] 的相關章節我還是認為不容易閱讀。

論文的部分，[23] 是可以先看一下的，因為它主要是介紹性質。[21] 裡面有好幾種基本架構簡介，也是可以反覆閱讀的。這樣先熱身一下，接下來就可以專攻 LeNet 了 [12]。

-----

B, 08-10

RNN 是可以記憶的單元，加上 Controller，則有 Neural Turing Machine (NTM)、Differential Neural Computer (DNC ) 與 Memory Augmented Neural Network (MANN) 的架構 [30]-[37]。

-----

C, 11-15

這個部分主要是生成模型。

圖 2.1a 指出，Autoencoders (AE) 之後是 RL。看 Deep Generative Models (DGM) 之前也得先看 AE。

Boltzmann Machine 在二十章前半部，VAE 與 GAN 都藏身在 20.10 裡，參考圖2.2a。

Fig. 2.1a. Deep Learning, p. 12 [29].

Fig. 2.1b. Autoencoder, p. 501 [28].

Fig. 2.1c. Representation learning, [38].

Fig. 2.2a. Deep generative models, [29].

Fig. 2.2b. Restricted Boltzmann machine, p. 498 [28].

-----

D, 16-23

有關 Representation Learning，[41] 這本不容易啃，所以我從 [28] 裡面先把重點找出來，首先是四個比較重要的技術，分別是 Dynamic Programming (DP)、Monte Carlo Methods (MCM)、 Temporal Difference (TD) Learning、以及 Policy Gradient Method (PGM)。

MCM 很重要，[29] 裡也有一章。TD 可能是最重要的，參考圖3.2a。所以我也找了 Q-Learning 的資料 [43], [44]。

從圖3.1a中，我們知道，Markov Decision Process (MDP) 跟 Functional Approximation (FA) 是基礎。FA 在 [41] 裡面有不少章，但是想看懂 AlphaGo [50] 的話，少不了 FA，所以整本 [41] 還是跑不掉的。

MDP，簡單來說就是sars，參考圖3.1b。

Fig. 3.1a. RL basis, p. 513 [28].

Fig. 3.1b. Basic RL model, p. 524 [28].

Fig. 3.2a. RL methods, p. 513 [28].

Fig. 3.2b. Q-learning, p. 537 [28].

Fig. 3.2c. Actor-Critic, p. 538 [28].

-----

E, 24-27

[45] 有很詳細的 DRL 介紹，包含今年 2017 的出版論文。ATARI 電玩是 DQN、S3C、UNREAL 這幾篇論文的重點 [46]-[49]。

-----

F, 28-34

最後是 AlphaGo [50]，這個有好幾篇中文文章介紹它用的技術 [11]，除了 CNN 之外，還用到 PGM [51] 以及 MCTS [52], [53]。由於這個 PGM 有用到 function approximation，所以要把 [41] 讀完，才能真正瞭解 AlphaGo。

-----

結論：

很快把 DRL 整個帶過一遍。算是地圖而非旅遊指南。後面還有很長的路要走呢！

-----

Fig. 4.1a. Policy gradient method [50].

Fig. 4.1b. Policy and value networks [50].

Fig. 4.1c. Monte Carlo tree search (MCTS) [50].

Fig. 4.2. Actor-critic artificial neural network and a hypothetical neural implementation, p. 340 [41].

-----

References

[1] AI從頭學（一）：文獻回顧

[2] AI從頭學（二）：Popular Deep Learning Software Tools

[3] AI從頭學（三）：Popular Deep Learning Hardware Tools

[4] AI從頭學（四）：AD and LeNet

[5] AI從頭學（五）：AD and Python

[6] AI從頭學（六）：The Net

[7] AI從頭學（七）：AD and Python from Jason

[8] AI從頭學（八）：The Net from Mark

[9] AI從頭學（九）：Back Propagation

[10] AI從頭學（一０）：Automatic Differentiation

[11] AI從頭學（一一）：A Glance at Deep Learning

[12] AI從頭學（一二）：LeNet

[13] AI從頭學（一三）：LeNet - F6

[14] AI從頭學（一四）：Recommender

[15] AI從頭學（一五）：Deep Learning，How？

[16] AI從頭學（一六）：Deep Learning，What？

[17] AI從頭學（一七）：Shallow Learning

[18] AI從頭學（一八）：Convolutional Neural Network

[19] AI從頭學（一九）：Recurrent Neural Network

[20] AI從頭學（二０）：Deep Learning，Hot

[21] Wang, Hao, and Dit-Yan Yeung. "Towards Bayesian Deep Learning: A Survey." arXiv preprint arXiv:1604.01662 (2016).

[22] Lacey, Griffin, Graham W. Taylor, and Shawki Areibi. "Deep learning on fpgas: Past, present, and future." arXiv preprint arXiv:1602.04283 (2016).

[23] LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." Nature 521.7553 (2015): 436-444.

[24]Schmidhuber, Jürgen. "Deep learning in neural networks: An overview." Neural networks 61 (2015): 85-117.

[25]Deng, Li, and Dong Yu. "Deep learning: methods and applications." Foundations and Trends® in Signal Processing 7.3–4 (2014): 197-387.

[26] Bengio, Yoshua, Aaron C. Courville, and Pascal Vincent. "Unsupervised feature learning and deep learning: A review and new perspectives." CoRR, abs/1206.5538 1 (2012).

[27] Bengio, Yoshua. "Learning deep architectures for AI." Foundations and trends® in Machine Learning 2.1 (2009): 1-127.

[28] Gollapudi, Sunila . Practical machine learning. Packt Publishing, 2016.

[29] Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Deep learning. MIT Press, 2016.
http://www.deeplearningbook.org/

[30]

[31] 程式編寫程式：泛用人工智慧領域的一顆明珠 - 歌穀穀
http://www.gegugu.com/2017/03/16/9050.html

[32] 深度學習挑戰馮·諾依曼結構_幫趣網
bangqu.com/gpu/blog/5239

[33] 深度學習的新方向 One-shot learning - George's Research Website
http://tzuching1.weebly.com/blog/-one-shot-learning

[34] Olah, Chris, and Shan Carter. "Attention and Augmented Recurrent Neural Networks." Distill 1.9 (2016): e1.
http://distill.pub/2016/augmented-rnns/

[35] Graves, Alex, Greg Wayne, and Ivo Danihelka. "Neural turing machines." arXiv preprint arXiv:1410.5401 (2014).

[36] Graves, Alex, et al. "Hybrid computing using a neural network with dynamic external memory." Nature 538.7626 (2016): 471-476.

[37] Santoro, Adam, et al. "One-shot learning with memory-augmented neural networks." arXiv preprint arXiv:1605.06065 (2016).

[38] Bengio, Yoshua, Aaron Courville, and Pascal Vincent. "Representation learning: A review and new perspectives." IEEE transactions on pattern analysis and machine intelligence 35.8 (2013): 1798-1828.

[39] Doersch, Carl. "Tutorial on variational autoencoders." arXiv preprint arXiv:1606.05908 (2016).

[40] Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems. 2014.

[41] Sutton, Richard S., and Andrew G. Barto. Reinforcement learning: An introduction. (2016): 424.
http://incompleteideas.net/sutton/book/bookdraft2016sep.pdf

[42] Heidrich-Meisner, Verena, et al. "Reinforcement learning in a nutshell." ESANN. 2007.

[43] Watkins, Christopher JCH, and Peter Dayan. "Q-learning." Machine learning 8.3-4 (1992): 279-292.

[44] Artificial Intelligence - foundations of computational agents -- 11_3_3 Q-learning
http://artint.info/html/ArtInt_265.html

[45] Li, Yuxi. "Deep reinforcement learning: An overview." arXiv preprint arXiv:1701.07274 (2017).

[46] 深度增強學習前沿算法思想 - 歌穀穀
http://www.gegugu.com/2017/02/17/1360.html

[47] Mnih, Volodymyr, et al. "Human-level control through deep reinforcement learning." Nature 518.7540 (2015): 529-533.

[48] Mnih, Volodymyr, et al. "Asynchronous methods for deep reinforcement learning." International Conference on Machine Learning. 2016.

[49] Jaderberg, Max, et al. "Reinforcement learning with unsupervised auxiliary tasks." arXiv preprint arXiv:1611.05397 (2016).

[50] Silver, David, et al. "Mastering the game of Go with deep neural networks and tree search." Nature 529.7587 (2016): 484-489.

[51] Sutton, Richard S., et al. "Policy gradient methods for reinforcement learning with function approximation." NIPS. Vol. 99. 1999.

[52] Browne, Cameron B., et al. "A survey of monte carlo tree search methods." IEEE Transactions on Computational Intelligence and AI in games 4.1 (2012): 1-43.

[53] Huang, Shih-Chieh, and Martin Müller. "Investigating the limits of Monte-Carlo tree search methods in computer Go." International Conference on Computers and Games. Springer International Publishing, 2013.

The Star Also Rises

Thursday, January 17, 2019

AI 從頭學（二一）：A Glance at Deep Reinforcement Learning

Programmer

Blog Archive

Labels

Recent Comments

My Blog List

MY LINKS

status

About Me