Sunday, February 28, 2021

DRL

DRL

2021/02/28

-----


https://pixabay.com/zh/photos/spring-background-flower-yellow-316535/

-----

References

# CSI

Tsai, Yu-Han, Yu-Jie Jheng, and Rua-Huan Tsaih. "The Cramming, Softening and Integrating Learning Algorithm with Parametric ReLu Activation Function for Binary Input/Output Problems." 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 2019.

https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8852023&casa_token=JqH4St9aBBEAAAAA:32RPdY7OqKbV7j9fttvgqo-H59Wh2U6EXq50TAmo2is5H-N_5hsOuEayk3O-PSUkbhqGJmsl


# The Cramming, Softening and Integrating Learning Algorithm for Real Input/Binary Output Problems - 政大學術集成

https://ah.nccu.edu.tw/item?item_id=145746&locale=en-US


# 蔡 瑞煌 | 政治大學商學院-資訊管理學系

https://mis2.nccu.edu.tw/zh_tw/Faculty/Faculty_01/%E8%94%A1-%E7%91%9E%E7%85%8C-71626392


#【5分钟 Paper】DQN - 知乎

https://zhuanlan.zhihu.com/p/113120993


# Open AI Gym 簡介與 Q learning 演算法實作

https://blog.techbridge.cc/2017/11/04/openai-gym-intro-and-q-learning/


# DQN 的延伸論文

Justesen, Niels, et al. "Deep learning for video game playing." IEEE Transactions on Games 12.1 (2019): 1-20.

https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8632747&casa_token=ZSYX1uPfY58AAAAA:sun0R0vWWblM8Hj5C5MHPgJTjeJKYRfreWOI50-MDRw375WWbgUaitm3exi60-BHEQAIz3PT&tag=1

-----

Saturday, February 27, 2021

林文月

 林文月

2021/02/27

-----


https://pixabay.com/zh/photos/tree-cat-silhouette-moon-full-moon-736877/

-----

12:00

-----

回到高雄老家已經一個禮拜了,接下來,這裡就是家了。


收拾出一個房間,電腦可以工作。持續準備論文內容時,心情也舒緩一些了。林文月的書,擬古,本來在大妹的書架,現在還是在書架,但已經是我的書架了。


試著開始讀一點書,擬古是第一選擇。為何,我不知。也許歷史性與實驗性吸引了我。


開卷讀完序之後,興起查一下作者資料,1933/09/05,八字是癸酉庚申甲戌,七殺格正印化殺,格局頗大,時柱推測是甲子,以台大教授,正印得地,加以推論。其母為連橫長女。星盤先以台北午時排出,還看不出太多東西。

2021/02/27

-----


04:00

-----

由《飲膳札記》推測為丙寅時,食神有力而透。當然考取師大藝術系也可推論為丁卯時,傷官透,但此傷官不旺,食神卻旺,所以食神還是比較有可能。丙寅對沖庚申,丁卯對沖癸酉。星盤有月亮對沖木星。

《飲膳札記》好讀有。

香港八日草。

枕草子。

枕草子我買過,應該是木馬文化的周作人版本。印象不是很深刻。林文月的翻譯,卻是很不錯。

香港是個有趣的地方。搬家回高雄之前,神遊了幾年。主要是四個山徑。麥理浩徑,衛奕信徑,鳳凰徑,港島徑。在清大跑操場時,就搭配想像在這些山徑上奔馳。更早一點有港劇。金庸,倪匡,自不待言。

日本文學也挺有意思。只是我的日文程度,只能看一點日本動畫,僅此而已。

2021/02/27

-----



Thursday, February 25, 2021

NAS-RL

 NAS-RL

2019/12/20

-----

References

# NAS-RL

Zoph, Barret, and Quoc V. Le. "Neural architecture search with reinforcement learning." arXiv preprint arXiv:1611.01578 (2016).
https://arxiv.org/pdf/1611.01578.pdf 

论文笔记-NAS - 知乎
https://zhuanlan.zhihu.com/p/47221948

Reinforcement Learning 4:Policy Gradient

強化學習(四):Policy Gradient

2020/04/18

-----


// 強化學習演進路線 [7]。

-----

References

[1] 深度强化学习cs294 Lecture5: Policy Gradients Introduction_人工智能_无所知的博客-CSDN博客
https://blog.csdn.net/qq_25037903/article/details/84573048

[2] CS 285
http://rail.eecs.berkeley.edu/deeprlcourse/

[3] CS294-112 Fa18 9/5/18 - YouTube
https://m.youtube.com/watch?v=XGmd3wcyDg8&list=PLkFD6_40KJIxJMR-j5A1mkxK26gh_qg37&index=21

[4] 强化学习系列(十三):Policy Gradient Methods_网络_LagrangeSK的博客-CSDN博客
https://blog.csdn.net/lagrangesk/article/details/82865578

[5] Teaching - David Silver
https://www.davidsilver.uk/teaching/

[6] RL Course by David Silver - Lecture 7: Policy Gradient Methods - YouTube
https://m.youtube.com/watch?v=KHZVXao4qXs

[7] 强化学习演进路线 - 知乎
https://zhuanlan.zhihu.com/p/49429128

-----


-----


-----


-----


 -----


 -----

Reinforcement Learning 3:Lilian Weng

強化學習(三):Lilian Weng

2020/04/17


-----


// [1]。

-----


圖一,強化學習的基礎 [1]。

-----


圖二,DP、MC、TD、PG [1]。

-----


圖三,SARS [1]。

-----


圖四,強化學習三要素 [1]。

-----


圖五,Reward Function and Policy [1]。

-----


圖六,Value Function [1]。

-----


圖七,State Value and Action Value [1]。

-----


圖八,Optimal Value and Policy [1]。

-----


圖九,Markov Decision Process [1]。

-----


圖十,圖解 Markov Decision Process [1]。

-----


圖十一,Bellman Equations [1]。

-----


圖十二,圖解 Bellman Expectation Equations [1]。

-----


圖十三,Bellman Expectation Equations [1]。

-----


圖十四,Bellman Optimality Equations [1]。

-----


圖十五,Policy Evaluation and Policy Improvement [1]。

-----


圖十六,Policy Iteration [1]。

-----

References

[1] A (Long) Peek into Reinforcement Learning
https://lilianweng.github.io/lil-log/2018/02/19/a-long-peek-into-reinforcement-learning.html

[2] 強化學習(二):綜述型論文
http://hemingwang.blogspot.com/2020/04/reinforcement-learningsurvey-paper.html

[3] 強化學習(一):簡介
http://hemingwang.blogspot.com/2020/04/reinforcement-learningan-introduction.html


Reinforcement Learning 2:Survey Paper

強化學習(二):綜述型論文

2020/04/17

-----



// Two dimensions of RL [3]。

-----


// Reinforcement Learning [1]。

-----


// SARS [1]。

-----


// Two dimensions of RL [3]。

-----


// Epsilon-greedy [1]。

-----


// Monte Carlo [1]。

-----


// Sarsa and Q-learning [1]。

-----


// Actor Critic [3]。

-----


// REINFORCEwb [2]。

-----


// Actor Critic [2]。

-----


// Actor Critic [4]。

-----


// Sarsa [4] 。

-----


// Actor Critic [4]。

-----

References

[1] System
Nguyen, Ngoc Duy, Thanh Nguyen, and Saeid Nahavandi. "System design perspective for human-level agents using deep reinforcement learning: A survey." IEEE Access 5 (2017): 27091-27102.
https://ieeexplore.ieee.org/ielx7/6287639/7859429/08119919.pdf?tp=&arnumber=8119919&isnumber=7859429&ref=aHR0cHM6Ly9zY2hvbGFyLmdvb2dsZS5jb20udHcvc2Nob2xhcj9obD16aC1UVyZhc19zZHQ9MCUyQzUmcT1zeXN0ZW0rZGVzaWduK2RlZXArcmVpbmZvcmNlbWVudCZvcT1zeXN0ZW0rZGVzaWduK2RlZXArcmU=

[2] Overview
Yuxi. "Deep reinforcement learning: An overview." arXiv preprint arXiv:1701.07274 (2017).
https://arxiv.org/pdf/1701.07274.pdf

[3] Survey
Arulkumaran, Kai, et al. "Deep reinforcement learning: A brief survey." IEEE Signal Processing Magazine 34.6 (2017): 26-38.
https://www.gwern.net/docs/rl/2017-arulkumaran.pdf

[4] Algorithms
Csaba. "Algorithms for reinforcement learning." Synthesis lectures on artificial intelligence and machine learning 4.1 (2010): 1-103.
https://sites.ualberta.ca/~szepesva/papers/RLAlgsInMDPs.pdf

Reinforcement Learning 1:An Introduction

強化學習(一):簡介

2020/04/17

-----


Demystifying Deep Reinforcement Learning | Computational Neuroscience Lab
https://neuro.cs.ut.ee/demystifying-deep-reinforcement-learning/

-----


Deep Reinforcement Learning

TD(Q-learning):DQN、DDQN、DNA、NAF、C51、QR-DQN、HER、DQfD、Rainbow。
AC(Actor-Critic):A3C(A2C)、(DRQN)UNREAL、DPG、DDPG、TD3、SAC、ACKTR。
PG(REINFORCE):TRPO、PPO、PDO、CPO、IPO。

-----


// AlphaGo [1]。

-----


// DQN [1]。

-----


// NAS-RL [2]。

-----


// SARS [1]。

-----


Demystifying Deep Reinforcement Learning | Computational Neuroscience Lab
https://neuro.cs.ut.ee/demystifying-deep-reinforcement-learning/

-----


Q-Learning : A Maneuver of Mazes - Becoming Human: Artificial Intelligence Magazine
https://becominghuman.ai/q-learning-a-maneuver-of-mazes-885137e957e4

-----


My Journey to Reinforcement Learning — Part 1: Q-Learning with Table
https://towardsdatascience.com/my-journey-to-reinforcement-learning-part-1-q-learning-with-table-35540020bcf9

-----


Q-Learning : A Maneuver of Mazes - Becoming Human: Artificial Intelligence Magazine
https://becominghuman.ai/q-learning-a-maneuver-of-mazes-885137e957e4

-----


Introduction to Reinforcement Learning — Deep Reinforcement Learning for Hackers (Part 0)
https://medium.com/@curiousily/getting-your-feet-rewarded-deep-reinforcement-learning-for-hackers-part-0-900ca5bb83e5

-----


Q-learning - Wikipedia
https://en.m.wikipedia.org/wiki/Q-learning

-----


Introduction to Deep Q-Learning for Reinforcement Learning (in Python)
https://www.analyticsvidhya.com/blog/2019/04/introduction-deep-q-learning-python/

-----


// Sarsa [3]。

-----


// Q-learning [3]。

-----


// Sarsa and Q-learning [4]。

-----


Deep-Learning-Papers-Reading-Roadmap/README.md at master · floodsung/Deep-Learning-Papers-Reading-Roadmap · GitHub
https://github.com/floodsung/Deep-Learning-Papers-Reading-Roadmap/blob/master/README.md

-----


// 強化學習演進路線 [5]。

-----


// 一些強化學習的演算法 [6], [7]。

-----


[1708.07902] Deep Learning for Video Game Playing
https://arxiv.org/abs/1708.07902

-----


[1910.09615] IPO: Interior-point Policy Optimization under Constraints
https://arxiv.org/abs/1910.09615

-----


[1910.09615] IPO: Interior-point Policy Optimization under Constraints
https://arxiv.org/abs/1910.09615

-----


用Python實作強化學習|使用TensorFlow與OpenAI Gym
http://books.gotop.com.tw/v_ACD017800

-----


GitHub - Curt-Park/rainbow-is-all-you-need: Rainbow is all you need! A step-by-step tutorial from DQN to Rainbow
https://github.com/Curt-Park/rainbow-is-all-you-need

-----


GitHub - MrSyee/pg-is-all-you-need: Policy Gradient is all you need! A step-by-step tutorial for well-known PG methods.
https://github.com/MrSyee/pg-is-all-you-need

-----

References

[1] 強化學習 Reinforcement Learning
https://www.slideshare.net/mobile/yenlung/reinforcement-learning-90737484

[2] [論文閱讀]Neural Architecture Search with Reinforcement Learning – AMMAI
https://sss050531.wordpress.com/2018/06/09/論文閱讀neural-architecture-search-with-reinforcement-learning/

[3] 强化学习(七)--Q-Learning和Sarsa - 知乎
https://zhuanlan.zhihu.com/p/46850008

[4] artificial intelligence - What is the difference between Q-learning and SARSA? - Stack Overflow
https://stackoverflow.com/questions/6848828/what-is-the-difference-between-q-learning-and-sarsa

[5] 强化学习演进路线 - 知乎
https://zhuanlan.zhihu.com/p/49429128

[6] Reinforcement Learning algorithms — an intuitive overview
https://medium.com/@SmartLabAI/reinforcement-learning-algorithms-an-intuitive-overview-904e2dff5bbc

[7] Part 2: Kinds of RL Algorithms — Spinning Up documentation
https://spinningup.openai.com/en/latest/spinningup/rl_intro2.html