The Star Also Rises: August 2021

Monday, August 30, 2021

高雄小旅行（一）：88 - 高屏溪

2021/08/30

昨天星期天，早上心血來潮，想騎腳踏車到潮州，結果到高屏溪就折返了。

-----

遠因是在新竹的時候，有段時間經常在新竹縣的山區騎公路車。近因是前一陣子在冠源「慶祝」保齡球館即將歇業，當天一整天，同學都在潮州忙他的果園，打算中醫退休後有個地方可以休閒。潮州離鳳山不遠，沿著台 88 線，一下子就到了。我主要是想確定一下東西向下方的市道 188，能不能也是一直騎到潮州。

秋天是適合出外運動的季節。沒有夏天熱，也不容易下雨，除了颱風（意外星期一就下雨了）。腳踏車內胎也剛換好。星期天早晨，臨時起意，就上路了。雖說是臨時起意，手機跟皮夾我還是帶著，穿上車衣、戴上安全帽，只是沒帶水壺就是。

公路尚稱平坦，基本上沒有起伏的地形。但公路往南的話，有低矮的鳳山丘陵，地勢就高了一點。原則上，出了鳳山，就是大寮鄉，現在是大寮區的鄉間景致了。快到高屏溪的時候，右手邊還看到名聞已久的「大發工業區」。

途中到了一個地方，機慢車紛紛騎到東西向快速道路上，我納悶著，這裡應該沒有鐵軌要過才對啊。我沒上快速道路，繼續在平面道路上騎，結果再一小段，恍然大悟，「高屏溪」到了。時間是十點四十，剛好差不多，我可以開始回程，中午前應該可以回到家。

回程就是原路回去。拍一下路標到此一遊。台 29，也許下次可以走不同的路回去。到家時時間是十一點四十，所以早上一共花了兩小時的時間騎車，沒有騎很快，但也算有運動到。理論上，一回高雄就可以進行這件事，但之前還是比較想要跑步就好，附近鳳新高中的操場雖然不比清大操場，但也還不錯，可惜後來疫情就不開放，試著在市區柏油路上跑了一陣，還是覺得不順。此外戴著口罩也不方便跑。最終還是妥協，戴著口罩，也還是出個遠門騎騎腳踏車。

累是沒有很累，胃口明顯有比較好，中餐吃了個飽飽，飯後還補了個應景的文旦，中秋，很快就要到了！

-----

Thursday, August 19, 2021

Imitation Learning

2021/08/17

-----

https://pixabay.com/zh/photos/woman-shopping-lifestyle-beautiful-3040029/

-----

冠狀病毒疾病 (COVID-19) 的預測模型：對最新技術的調查

2021/08/31

[HTML] Forecasting models for coronavirus disease (COVID-19): a survey of the state-of-the-art

GR Shinde, AB Kalamkar, PN Mahalle, N Dey… - SN Computer …, 2020 - Springer

「COVID-19 is a pandemic that has affected over 170 countries around the world. The number of infected and deceased patients has been increasing at an alarming rate in almost all the affected nations. Forecasting techniques can be inculcated thereby assisting in designing …」

被引用 111 次相關文章全部共 9 個版本

「COVID-19 是一種流行病，已影響到全球 170 多個國家/地區。在幾乎所有受影響的國家，感染和死亡患者的數量都以驚人的速度增加。可以灌輸預測技術，從而幫助設計……」

「Composite Monte Carlo decision making under high uncertainty of novel coronavirus epidemic using hybridized deep learning and fuzzy rule induction」延伸的論文

-----

使用機器學習和深度學習算法進行 COVID-19 流行病分析

2021/08/30

COVID-19 epidemic analysis using machine learning and deep learning algorithms

NS Punn, SK Sonbhadra, S Agarwal - MedRxiv, 2020 - medrxiv.org

「The catastrophic outbreak of Severe Acute Respiratory Syndrome-Coronavirus (SARS-CoV-2) also known as COVID-2019 has brought the worldwide threat to the living society. The whole world is putting incredible efforts to fight against the spread of this deadly disease in …」

被引用 91 次相關文章全部共 5 個版本

「嚴重急性呼吸系統綜合症冠狀病毒 (SARS-CoV-2) 也稱為 COVID-2019 的災難性爆發給生命社會帶來了全球性威脅。全世界都在付出難以置信的努力來對抗這種致命疾病在……的傳播。」

-----

基於混合深度學習和模糊規則歸納的新型冠狀病毒流行高不確定性下的複合蒙特卡羅決策

2021/08/29

[HTML] Composite Monte Carlo decision making under high uncertainty of novel coronavirus epidemic using hybridized deep learning and fuzzy rule induction

SJ Fong, G Li, N Dey, RG Crespo… - Applied soft computing, 2020 - Elsevier

「In the advent of the novel coronavirus epidemic since December 2019, governments and authorities have been struggling to make critical decisions under high uncertainty at their best efforts. In computer science, this represents a typical problem of machine learning over …」

被引用 113 次相關文章全部共 16 個版本

「自 2019 年 12 月新型冠狀病毒流行以來，各國政府和當局一直在竭盡全力在高度不確定的情況下努力做出關鍵決策。在計算機科學中，這代表了機器學習的典型問題……」

-----

Structured Learning- 結構化學習

2021/08/28

「SVM (Support Vector Machine, 支援向量機)、Deep Learning 及 Neural Networks 模型的 input-output 都是向量 (vector)；但實際上 input-output 型式會比 vector 更複雜；可能是sequence，list，tree，或是 bounding box…。Structured learning 是要找到一個 function，使其 input 及 output 分別都是 object。」

https://ai4dt.wordpress.com/2018/05/23/structure-learning/

-----

Imitation Learning - YouTube

2021/08/27

https://www.youtube.com/watch?v=rOho-2oJFeA

-----

強化學習 — 模仿學習

2021/08/26

RL — Imitation Learning

「One of the biggest challenges is collecting expert demonstrations. Unless it has a huge business potential, the attached cost can be prohibitive. But technically, there is another major issue. We can never duplicate things exactly. Error accumulates fast in a trajectory and put us into situations that we never deal with before.」

「最大的挑戰之一是收集專家演示。除非它具有巨大的商業潛力，否則附加成本可能會令人望而卻步。但從技術上講，還有另一個主要問題。我們永遠無法完全複製事物。錯誤在一個軌跡中快速累積，並將我們置於以前從未處理過的情況中。」

https://jonathan-hui.medium.com/rl-imitation-learning-ac28116c02fc

-----

40 大模仿學習開源項目

2021/08/25

The Top 40 Imitation Learning Open Source Projects

https://awesomeopensource.com/projects/imitation-learning

-----

KAIST-AILab/deeprl_practice_colab：準備

2021/08/24

KAIST-AILab/deeprl_practice_colab: Preparation for ... - GitHubhttps://github.com › KAIST-AILab › dee...

翻譯這個網頁

Preparation for Deep Reinforcement Learning using Google Colab - GitHub ... Generative Adversarial Imitation Learning (GAIL) [Ho et al. NIPS 2016].

「使用 Google Colab 為深度強化學習做準備 - GitHub ... 生成對抗性模仿學習 (GAIL) [Ho 等人。 NIPS 2016]。」

https://github.com/KAIST-AILab/deeprl_practice_colab

-----

ICML2018 模仿學習教程 - Google

2021/08/23

ICML2018 Imitation Learning Tutorial - Google Siteshttps://sites.google.com › view › icml201...· 翻譯這個網頁

「In this tutorial, we aim to present to researchers and industry practitioners a broad overview of imitation learning techniques and recent applications.」

2018年4月22日 · 上傳者：Hoang Le

「在本教程中，我們旨在向研究人員和行業從業者展示模仿學習技術和近期應用的廣泛概述。」

https://sites.google.com/view/icml2018-imitation-learning/

-----

通過領域自適應元學習觀察人類的一次性模仿

2021/08/22

One-shot imitation from observing humans via domain-adaptive meta-learning

T Yu, C Finn, A Xie, S Dasari, T Zhang… - arXiv preprint arXiv …, 2018 - arxiv.org

「Humans and animals are capable of learning a new behavior by observing others perform the skill just once. We consider the problem of allowing a robot to do the same--learning from a raw video pixels of a human, even when there is substantial domain shift in the …」

被引用 200 次相關文章全部共 7 個版本

「人類和動物能夠通過觀察其他人執行一項技能來學習新的行為。我們考慮了允許機器人做同樣事情的問題——從人類的原始視頻像素中學習，即使......」

-----

一鍵模仿學習

2021/08/21

One-shot imitation learning

Y Duan, M Andrychowicz, BC Stadie, J Ho… - arXiv preprint arXiv …, 2017 - arxiv.org

「Imitation learning has been commonly applied to solve different tasks in isolation. This usually requires either careful feature engineering, or a significant number of samples. This is far from what we desire: ideally, robots should be able to learn from very few …」

被引用 463 次相關文章全部共 11 個版本

「模仿學習已普遍應用於孤立地解決不同的任務。這通常需要仔細的特徵工程或大量樣本。這與我們的願望相去甚遠：理想情況下，機器人應該能夠從極少數人那裡學習……」

-----

Imitation learning: A survey of learning methods

模仿學習：學習方法調查

2021/08/20

A Hussein, MM Gaber, E Elyan, C Jayne - ACM Computing Surveys …, 2017 - dl.acm.org

「Imitation learning techniques aim to mimic human behavior in a given task. An agent (a learning machine) is trained to perform a task from demonstrations by learning a mapping between observations and actions. The idea of teaching by imitation has been around for …」

被引用 335 次相關文章全部共 6 個版本

「模仿學習技術旨在模仿給定任務中的人類行為。通過學習觀察和動作之間的映射，對代理（學習機）進行訓練以執行演示中的任務。模仿教學的想法已經存在了……」

-----

Generative adversarial imitation learning

生成對抗性模仿學習

2021/08/19

J Ho, S Ermon - Advances in neural information processing systems, 2016 - papers.nips.cc

「Consider learning a policy from example expert behavior, without interaction with the expert or access to a reinforcement signal. One approach is to recover the expert's cost function with inverse reinforcement learning, then extract a policy from that cost function with …」

被引用 1393 次相關文章全部共 13 個版本

「考慮從示例專家行為中學習策略，而不與專家交互或訪問強化信號。一種方法是使用逆強化學習恢復專家的成本函數，然後從該成本函數中提取策略，使用……」

-----

A brief overview of Imitation Learning

模仿學習的簡要概述

2021/08/18

「The simplest form of imitation learning is behaviour cloning (BC), which focuses on learning the expert’s policy using supervised learning. An important example of behaviour cloning is ALVINN, a vehicle equipped with sensors, which learned to map the sensor inputs into steering angles and drive autonomously. This project was carried out in 1989 by Dean Pomerleau, and it was also the first application of imitation learning in general.」

「模仿學習的最簡單形式是行為克隆（BC），它側重於使用監督學習來學習專家的策略。行為克隆的一個重要例子是 ALVINN，一種配備傳感器的車輛，它學會了將傳感器輸入映射到轉向角並自動駕駛。該項目由 Dean Pomerleau 於 1989 年開展，也是模仿學習的第一個應用。」

「The way behavioural cloning works is quite simple. Given the expert’s demonstrations, we divide these into state-action pairs, we treat these pairs as i.i.d. examples and finally, we apply supervised learning. The loss function can depend on the application. 」

「行為克隆的工作方式非常簡單。鑑於專家的演示，我們將它們分成狀態-動作對，我們將這些對視為 iid 示例，最後，我們應用監督學習。損失函數可以取決於應用程序。」

https://smartlabai.medium.com/a-brief-overview-of-imitation-learning-8a8a75c44a9c

-----

模仿學習（Imitation Learning）介紹

2021/08/17

「在傳統的強化學習任務中，通常通過計算累積獎賞來學習最優策略（policy），然而在多步決策（sequential decision）中，學習器不能頻繁地得到獎勵，且這種基於累積獎賞及學習方式存在非常巨大的搜索空間。而模仿學習（Imitation Learning）已經能夠很好地解決多步決策問題，在機器人、 NLP 等領域也有很多的應用。」

「模仿學習是指從示教者提供的範例中學習，一般提供人類專家的決策數據，之後就可以把狀態作為特徵（feature），動作作為標記（label）進行分類（對於離散動作）或回歸（對於連續動作）的學習從而得到最優策略模型。模型的訓練目標是使模型生成的狀態-動作軌跡分佈和輸入的軌跡分佈相匹配。類似自動編碼器（Autoencoder）與 GANs。」

https://zhuanlan.zhihu.com/p/25688750

-----

Saturday, August 14, 2021

Word2vec（三）：Illustrated

2021/07/19

-----

https://pixabay.com/zh/photos/numbers-cipher-calculation-list-16804/

-----

Word2vec

1.1 Skip-Gram Model

1.2 Continuous Bag-of-Word Model

2.1 Hierarchical Softmax

2.2 Negative Sampling

-----

資料來源：

https://arxiv.org/abs/1708.02709

說明：

英文的 word 從 one hot encoding 轉成 vector 之後，便可進行向量的運算。經典的例子是 King - Man + Woman = Queen。由這個「等式」，我們可以理解到，向量的某個維度，代表性別，某個維度，代表社會地位的高低。

比起簡單的 one-hot encoding，word-embedding 轉成向量表示後，增加了豐富的意涵。

-----

Figure 2: Two-dimensional PCA projection of the 1000-dimensional Skip-gram vectors of countries and their capital cities. The figure illustrates ability of the model to automatically organize concepts and learn implicitly the relationships between them, as during the training we did not provide any supervised information about what a capital city means.

圖 2：國家及其首都城市的 1000 維 Skip-gram 向量的二維 PCA 投影。該圖說明了模型自動組織概念並隱式學習它們之間關係的能力，因為在訓練期間我們沒有提供任何關於首都意味著什麼的監督信息。

# Word2vec 2。

-----

# Word2vec 1。

說明：

w：window。此處 window 大小為 5。CBOW 是以周邊的字預測中間應該出現什麼字。Skip-gram 是以中間的字預測周邊應該出現什麼字。

-----

Skip-Gram 中間的單字預測周圍的字

# Word2vec 3。

說明：

Input layer：以 skip-gram 為例，是 V 維的 one-hot encoding，非 0 的輸入神經元到隱藏層的權重，極為該單詞的詞向量。

Hidden layer：隱藏層。

Output layer：輸出層。

V-dim：輸入層的維度。

N-dim：隱藏層的維度。

CxV-dim：輸出層的維度。

W VxN：VxN 的矩陣。將輸入層的 V 維資料，轉成 N 維的隱藏層資料。

W' NxV：NxV 的矩陣。輸出為字彙表裡面，每個字的機率。先將隱藏層轉成 V 個值，再把這 V 個值做 Softmax 輸出。我們希望 context 裡的字，其機率越高越好。

xk：index。

hj：index。

y Cj：C 代表 context。target 代表 window 中間的字，context 代表 window 中，target 之外的其他字。CBOW 是 context 預測 target，skip-gram 則是 target 預測 context。

-----

「The training objective of the Skip-gram model is to find word representations that are useful for predicting the surrounding words in a sentence or a document. More formally, given a sequence of training words w1,w2,w3, . . . ,wT , the objective of the Skip-gram model is to maximize the average log probability where c is the size of the training context (which can be a function of the center word wt).」

Skip-gram 模型的訓練目標是找到對預測句子或文檔中的周圍單詞有用的單詞表示。更正式地說，給定一系列訓練詞 w1,w2,w3,... . . ,wT ，Skip-gram 模型的目標是最大化平均對數概率，其中 c 是訓練上下文的大小（可以是中心詞 wt 的函數）。

# Word2vec 2。

說明：

T：句子的長度，或文件的長度。

t：句子裡的字的 index。

j：window 裡的字的 index。

c：訓練上下文的大小，上文大小是 c，下文大小也是 c。

p：機率。

wt：中心字。

w(t+j)：周圍字。

公式的目的則如論文中的說明，中心字可以預測周圍字：「Skip-gram 模型的訓練目標是找到對預測句子或文檔中的周圍單詞有用的單詞表示。」

https://www.quora.com/Why-is-word2vec-a-log-linear-model

-----

「The basic Skip-gram formulation defines p(wt+j |wt) using the softmax function:

where vw and v′ w are the “input” and “output” vector representations of w, and W is the number of words in the vocabulary. This formulation is impractical because the cost of computing ∇log p(wO|wI ) is proportional to W, which is often large (10^5–10^7 terms).」

基本的 Skip-gram 公式使用 softmax 函數定義 p(wt+j |wt)：

其中 vw 和 v' w 是 w 的“輸入”和“輸出”向量表示，W 是詞彙表中的單詞數。這個公式是不切實際的，因為計算 ∇log p(wO|wI ) 的成本與 W 成正比，W 通常很大（10^5-10^7 項）。

# Word2vec 2。

說明：

論文中的一句：「其中 vw 和 v' w 是 w 的“輸入”和“輸出”向量表示」。輸入向量表示，就是一般概念的 word2vec 詞向量。另有一個比較少見的，是輸出向量表示，此輸出向量即為隱藏層到輸出層的矩陣中的向量，個數也是與詞彙表的單詞數相同。

p：機率。

wO：應該輸出的字。

wI：輸入的字。

W：詞彙表的大小。

w：詞彙表的 index。

v'w：輸出向量表示（每一個）。

v'wO：輸出向量表示（目標）。

-----

說明：

Skip-gram 模型，以中間的字預測周邊的字。

https://zhuanlan.zhihu.com/p/27234078

-----

https://zhuanlan.zhihu.com/p/27234078

-----

CBOW 周圍的字預測中間的單字

# Word2vec 3。

說明：

所有 context 字的 one hot 先乘以共享的 VxN 矩陣，得到的每個向量相加求平均，作為隱藏層的向量。

https://blog.csdn.net/WitsMakeMen/article/details/89511764

-----

說明：

Huffman coding 嘗試用最少的位元代表頻率最高的字。

https://www.gatevidyalay.com/huffman-coding-huffman-encoding/

-----

Hierarchical Softmax

# Word2vec 3。

說明：

分層 softmax 模型的範例二元樹。白色單元是詞彙表中的單詞，深色單元是內部單元。粗線顯示了從 root 到 w2 的示範路徑。在所示的範例中，路徑的長度 L(w2) = 4。n(w; j) 表示從根到單詞 w 的路徑上的第 j 個單元。

-----

圖片：

https://zhuanlan.zhihu.com/p/66417229

https://ruder.io/word-embeddings-softmax/

說明：

假設字彙表共有 V 個字。原先希望目標字機率為 1，其他字機率為 0，但如此計算量很大。採用 Hierarchical Softmax 後，只要考慮 V - 1 個非葉節點，路徑上的機率即可，如此計算量則大幅減少。預測值為往右，sigmoid 的值極大化。若是往左，則將 1 減去 sigmoid 的值即可。參考上圖。

p(right|n,c)=sigmoid(hT v'n)。

https://www.cnblogs.com/pinard/p/7243513.html

https://zhuanlan.zhihu.com/p/56139075

-----

Negative Sampling

-----

https://tengyuanchang.medium.com/%E8%AE%93%E9%9B%BB%E8%85%A6%E8%81%BD%E6%87%82%E4%BA%BA%E8%A9%B1-%E7%90%86%E8%A7%A3-nlp-%E9%87%8D%E8%A6%81%E6%8A%80%E8%A1%93-word2vec-%E7%9A%84-skip-gram-%E6%A8%A1%E5%9E%8B-73d0239ad698

說明：

Negative Sampling

Positive sample：(fox, quick)。1 個。

Negative samples：(fox, word_not_quick)。9999 個。

小規模數據集：選 5 ~ 20 個 negative samples。

大規模數據集：選 2 ~ 5 個 negative samples。

-----

References

# NNLM。被引用 7185 次。

Bengio, Yoshua, et al. "A neural probabilistic language model." Journal of machine learning research 3.Feb (2003): 1137-1155.

http://www-labs.iro.umontreal.ca/~felipe/IFT6010-Automne2011/resources/tp3/bengio03a.pdf

# Word2vec 1。被引用 18991 次。

Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 (2013).

https://arxiv.org/pdf/1301.3781.pdf

# Word2vec 2。被引用 23990 次。

Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems. 2013.

https://papers.nips.cc/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf

# Word2vec 3。被引用 645 次。

Rong, Xin. "word2vec parameter learning explained." arXiv preprint arXiv:1411.2738 (2014).

https://arxiv.org/pdf/1411.2738.pdf

# C&W v1。被引用 5099 次。

Collobert, Ronan, and Jason Weston. "A unified architecture for natural language processing: Deep neural networks with multitask learning." Proceedings of the 25th international conference on Machine learning. 2008.

http://www.cs.columbia.edu/~smaskey/CS6998-Fall2012/supportmaterial/colbert_dbn_nlp.pdf

# C&W v2。被引用 6841 次。本篇論文闡釋了從 Word2vec 繼續發展 Paragraph2vec 的必要性。

Collobert, Ronan, et al. "Natural language processing (almost) from scratch." Journal of machine learning research 12.ARTICLE (2011): 2493-2537.

https://www.jmlr.org/papers/volume12/collobert11a/collobert11a.pdf

Word2Vec Tutorial - The Skip-Gram Model · Chris McCormick

http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/

理解 Word2Vec 之 Skip-Gram 模型 - 知乎

https://zhuanlan.zhihu.com/p/27234078

-----

[4] GloVe

Pennington, Jeffrey, Richard Socher, and Christopher Manning. "Glove: Global vectors for word representation." Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014.

https://www.aclweb.org/anthology/D14-1162

[5] fastText v1

Joulin, Armand, et al. "Bag of tricks for efficient text classification." arXiv preprint arXiv:1607.01759 (2016).

https://arxiv.org/pdf/1607.01759.pdf

[6] fastText v2

Bojanowski, Piotr, et al. "Enriching word vectors with subword information." Transactions of the Association for Computational Linguistics 5 (2017): 135-146.

https://www.mitpressjournals.org/doi/pdfplus/10.1162/tacl_a_00051

[7] WordRank

Ji, Shihao, et al. "Wordrank: Learning word embeddings via robust ranking." arXiv preprint arXiv:1506.02761 (2015).

https://arxiv.org/pdf/1506.02761.pdf

-----

Friday, August 13, 2021

Word2vec（二）：Overview

Word2vec Overview

2020/12/27

-----

https://pixabay.com/zh/photos/coffee-grinds-filter-2616923/

-----

◎ Abstract

-----

◎ Introduction

-----

本論文要解決（它之前研究）的（哪些）問題（弱點）？

-----

# NNLM。

NNLM 雖然有很好的詞表示，但是訓練複雜度比較大。

https://www.jiqizhixin.com/graph/technologies/c61ba3b9-40e2-4864-a941-9adc19e6792e

-----

◎ Method

-----

解決方法？

-----

# Word2vec 1。

-----

具體細節？

http://hemingwang.blogspot.com/2021/07/word2vecillustrated.html

-----

◎ Result

-----

本論文成果。

-----

◎ Discussion

-----

本論文與其他論文（成果或方法）的比較。

-----

成果比較。

-----

方法比較。

-----

◎ Conclusion

-----

◎ Future Work

-----

後續相關領域的研究。

-----

後續延伸領域的研究。

-----

◎ References

-----

# NNLM。被引用 7185 次。

Bengio, Yoshua, et al. "A neural probabilistic language model." Journal of machine learning research 3.Feb (2003): 1137-1155.

http://www-labs.iro.umontreal.ca/~felipe/IFT6010-Automne2011/resources/tp3/bengio03a.pdf

# Word2vec 1。被引用 18991 次。

Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 (2013).

https://arxiv.org/pdf/1301.3781.pdf

# Word2vec 2。被引用 23990 次。

Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems. 2013.

https://papers.nips.cc/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf

# Word2vec 3。被引用 645 次。

Rong, Xin. "word2vec parameter learning explained." arXiv preprint arXiv:1411.2738 (2014).

https://arxiv.org/pdf/1411.2738.pdf

# C&W v1。被引用 5099 次。

http://www.cs.columbia.edu/~smaskey/CS6998-Fall2012/supportmaterial/colbert_dbn_nlp.pdf

# C&W v2。被引用 6841 次。本篇論文闡釋了從 Word2vec 繼續發展 Paragraph2vec 的必要性。

Collobert, Ronan, et al. "Natural language processing (almost) from scratch." Journal of machine learning research 12.ARTICLE (2011): 2493-2537.

https://www.jmlr.org/papers/volume12/collobert11a/collobert11a.pdf

-----

The Star Also Rises: Word2vec

https://hemingwang.blogspot.com/2019/04/word2vec.html

-----

06C_Word2vec

06C_Word2vec

2020/07/18

-----

一、Word2vec Family

Fig. Word2vec（圖片來源）。

-----

二、Outline

https://hemingwang.blogspot.com/2020/07/06cword2vec.html

06C_Word2vec

◎ Word2vec v1：CBOW and Skip-gram。
◎ Word2vec v2：Hierarchical Softmax and Negative Sampling。
◎ Word2vec v3：Simplfied Word2vec v1 and v2。

◎ LSA：Co-occurrence Matrix + SVD。
◎ GloVe：Word2vec + LSA
◎ fastText v1：CBOW and w(t) to Label。
◎ fastText v2：Skip-gram and Word to Subword。
◎ WordRank：Word Embedding to Word Ranking

-----

三、Word2vec

https://medium.com/@tengyuanchang/%E8%AE%93%E9%9B%BB%E8%85%A6%E8%81%BD%E6%87%82%E4%BA%BA%E8%A9%B1-%E7%90%86%E8%A7%A3-nlp-%E9%87%8D%E8%A6%81%E6%8A%80%E8%A1%93-word2vec-%E7%9A%84-skip-gram-%E6%A8%A1%E5%9E%8B-73d0239ad698

說明：

Word2vec 是詞嵌入的代表性演算法，包含 CBOW，連續詞袋與 Skip-gram 兩個模型。CBOW 利用周邊的字預測中間的字，類似英文的克漏字測驗。Skip-gram 則利用中間的字預測周邊的字。兩種方法都可以獲得詞向量。

-----

四、King - Man + Woman = Queen

https://arxiv.org/abs/1708.02709

說明：

英文的 word 從 one hot encoding 轉成 vector 之後，便可進行向量的運算。經典的例子是 King - Man + Woman = Queen。由這個「等式」，我們可以理解到，向量的某個維度，代表性別，某個維度，代表社會地位的高低。

-----

五、Regression Model

https://www.deeplearningbook.org/

// Page 119。

說明：

在進入 Word2vec 之前，我們還是先回顧一下回歸分析。為什麼要先回顧回歸分析，因為 Word2vec 對第一次接觸的人來說，會顯的很複雜，所以我們先舉一個簡單的例子，並且是大家原來就很熟悉的。下一張圖片的 LeNet 模型，其實是一個很複雜的回歸模型。而 Word2vec，又是一個簡化後的 CNN 模型。回歸模型是大家最熟悉的，LeNet 也許是第二熟悉的。

-----

六、CNN Model

http://hemingwang.blogspot.com/2018/02/deep-learninglenet-bp.html

說明：

LeNet 也許是深度學習中，最為大家熟悉的 CNN 模型。LeNet 比 Word2vec 複雜很多，但是由於學習 Word2vec 之前，大家多半已經掌握 LeNet，所以我們利用 LeNet 來學習 Word2vec。簡單說，Word2vec 只有三層，輸入層、隱藏層、輸出層。輸出層到隱藏層之間的神經網路連接，在還沒進入激活函數之間，可以視為矩陣轉換。配合輸入的 one hot encoding，矩陣的列，就變成每個字的詞向量。

-----

七、Back Propagation

http://hemingwang.blogspot.com/2018/02/deep-learninglenet-bp.html

說明：

同樣的，在模型，輸入、輸出對應的資料集，以及損失函數決定後，Word2vec 也是用 Back Propagation 來學習詞向量，也就是神經網路的權重。

-----

八、CBOW

https://lilianweng.github.io/lil-log/2017/10/15/learning-word-embedding.html

說明：

橙色的部分，輸入層到隱藏層的之間的神經網路連結，其實是個矩陣。矩陣的值，也就是我們要學習的詞向量。綠色的部分，是隱藏層到輸出層的矩陣轉換，也就是預測周邊（或者是下一個字）的機率。在 Word2vec 裡面，這個部分不會被當作詞向量使用，但是在 ConvS2S 或者 Transformer 的 QKV，Query、Key、Value 的分解裡面，綠色這個部分，代表 Query。簡單說，Key 是 one hot encoding，輸入層。Value 是文字的涵義，也就是詞向量，橙色的部分。Query 是下個字的機率分布，也就是綠色的部分。

以上是 Word2vec 跟 QKV 的關係，是我在寫這段文字的時候，忽然冒出來的。這個理解，我認為接近正確。但此刻我尚未確定。

-----

九、Skip-gram

https://lilianweng.github.io/lil-log/2017/10/15/learning-word-embedding.html

說明：

這個是 Skip-gram 的模型。第一次看不容易理解，因為它畫的是矩陣而不是神經網路。Skip-gram 剛剛已經簡單介紹過。淺藍色的部分是重點，從輸入層的 one hot encoding，會變成隱藏層的詞向量。然後會對應到輸出層的機率。輸入層的 x 與輸出層的 y，就代表訓練用資料集的一筆資料，一個 word pair。

-----

一０、PKV

https://medium.com/@joealato/attention-in-nlp-734c6fa9d983

說明：

從 Attention 到 Key-Value 到 QKV。以 Word2vec 為例，Query 對應到隱藏層到輸出層之間的矩陣。Key 對應到輸入層的 one hot encoding。Value 對應到輸入層與隱藏層之間的矩陣。

-----

一一、Skip-gram Model

https://zhuanlan.zhihu.com/p/27234078

說明：

這張圖可能是網路上，Skip-gram 的經典。雖然輸入層到隱藏層的神經連結被簡化了，但是隱藏層到輸出層標示的很清楚，特別是輸出的部分。

-----

一二、Skip Gram Data

https://zhuanlan.zhihu.com/p/27234078

說明：

以圖左第四列為例，假定 sliding window 的長度是 5，那麼 fox 周邊的四個字分別是 quick、brown、jumps、over。

-----

一三、Skip-gram Result

https://zhuanlan.zhihu.com/p/27234078

說明：

訓練後的結果會有一個詞向量的表，one hot encoding 的特性可以萃取出對應的詞向量。

-----

一四、Weight Matrix of Word2vec

https://mc.ai/deep-nlp-word-vectors-with-word2vec/

說明：

本圖會比說明的文字更清楚。

-----

一五、Softmax

https://pojenlai.wordpress.com/2016/02/27/tensorflow%E8%AA%B2%E7%A8%8B%E7%AD%86%E8%A8%98-softmax%E5%AF%A6%E4%BD%9C/

說明：

進入 hierarchical softmax 之前，先看一下 softmax。

一樣，公式會比文字說明清楚。

-----

一六、Huffman Coding

https://hemingwang.blogspot.com/2020/08/huffman-coding.html

說明：

Huffman coding 嘗試用最少的位元代表頻率最高的字。作法：可參考上方連結。

演算法：

一、將 word 依頻率排序，由小到大。
二、將最小頻率的兩個值組成一棵樹，即兩個頻率相加，得到新頻率。回到一。若最後剩下兩個頻率值，則可組成最後的霍夫曼樹。

-----

一七、Hierarchical Softmax 一

https://zhuanlan.zhihu.com/p/66417229

說明：

「原始的 Word2Vec 使用 softmax 得到最終的詞彙概率分佈，詞彙表往往包含上百萬個單詞，如果針對輸出中每一個單詞都要用 softmax 計算概率的話，計算量是非常大的。解決辦法之一就是 Hierarchical Softmax。相比於原始的 softmax 直接計算每個單詞的概率，Hierarchical Softmax 使用一顆二元樹來得到每個單詞的概率。被驗證的效果最好的二元樹類型是霍夫曼樹。」

「霍夫曼樹中有 V-1 個中間節點，V 個葉節點。葉節點與單詞表中 V 個單詞一一對應。首先根據單詞出現的頻率構造一顆霍夫曼樹，出現頻率高的單詞霍夫曼編碼就短，更加靠近根節點。原來的 Word2Vec 模型結構會被改變，隱藏層後直接和霍夫曼樹中每一個非葉節點相連，如下圖所示（相當於輸出層中只有 V-1 個神經元節點）。然後再每一個非葉節點上計算二分概率（也就是用 Sigmoid 函數進行激活），這個概率是指從當前節點隨機遊走的概率，可以任意指定是向左遊走的概率，還是向右游走的概率。從根節點到目標單詞的路徑是唯一的，將中間非葉節點的遊走概率相乘就得到了最終目標單詞的概率。」

這樣只用計算樹深度個輸出節點的概率就可以得到目標單詞的概率。霍夫曼樹的深度基本是 logV，所以此時的計算複雜度就降為了 O (logV)。另外，高頻詞非常接近樹根，其所需要的計算次數將進一步減少，這也是使用霍夫曼樹的一個優點。

https://zhuanlan.zhihu.com/p/66417229

-----

一八、Hierarchical Softmax 二

https://ruder.io/word-embeddings-softmax/

說明：

原本 Softmax 的輸出層，假定是 V 個字的機率。Hierarchical Softmax 的輸出層，則改為 V - 1 個霍夫曼樹的節點。

以上面 CBOW 的例子為例，輸入為 the dog and the，預測是 cat，不用更新原本 10,000 個例子，只要更新 1、2、5、三個節點，讓其機率分別是左、右、右，使其輸出為 cat。

-----

一九、Hierarchical Softmax 三

https://sunjackson.github.io/2017/08/01/fb7b83894c233646897598c40c328c23/

http://building-babylon.net/2017/08/01/hierarchical-softmax/

https://zhuanlan.zhihu.com/p/53425736

說明：

這是一般的範例圖片，實在不容易直接從圖片理解。

-----

二０、Negative Sampling

https://python5566.wordpress.com/2018/03/17/nlp-%E7%AD%86%E8%A8%98-negative-sampling/comment-page-1/

http://mccormickml.com/2017/01/11/word2vec-tutorial-part-2-negative-sampling/

https://zhuanlan.zhihu.com/p/53425736

-----

廿一、Negative Sampling

https://medium.com/@tengyuanchang/%E8%AE%93%E9%9B%BB%E8%85%A6%E8%81%BD%E6%87%82%E4%BA%BA%E8%A9%B1-%E7%90%86%E8%A7%A3-nlp-%E9%87%8D%E8%A6%81%E6%8A%80%E8%A1%93-word2vec-%E7%9A%84-skip-gram-%E6%A8%A1%E5%9E%8B-73d0239ad698

說明：

Negative Sampling

Positive sample：(fox, quick)。1 個。
Negative samples：(fox, word_not_quick)。9999 個。

小規模數據集：選 5 ~ 20 個 negative samples。
大規模數據集：選 2 ~ 5 個 negative samples。

-----

廿二、TF-IDF - term frequency–inverse document frequency

https://medium.com/nanonets/topic-modeling-with-lsa-psla-lda-and-lda2vec-555ff65b0b05

-----

廿三、LSA1

https://www.analyticsvidhya.com/blog/2018/10/stepwise-guide-topic-modeling-latent-semantic-analysis/

-----

廿四、LSA2

https://www.analyticsvidhya.com/blog/2018/10/stepwise-guide-topic-modeling-latent-semantic-analysis/

-----

廿五、GloVe

https://towardsdatascience.com/word-embeddings-for-nlp-5b72991e01d4

-----

廿六、GloVe in a Picture

https://dudeperf3ct.github.io/lstm/gru/nlp/2019/01/28/Force-of-LSTM-and-GRU/

-----

廿七、GloVe Loss

https://medium.com/@jonathan_hui/nlp-word-embedding-glove-5e7f523999f6

-----

廿八、GloVe Alpha

Fig. Weighting Function []。

-----

廿九、fastText

https://www.jiqizhixin.com/articles/2020-07-03-14

-----

三０、fastText v1

https://www.twblogs.net/a/5ba122282b71771a4da89d89

-----

卅一、fastText v2

https://blog.csdn.net/u012931582/article/details/83818374

-----

卅二、WordRank

https://leovan.me/cn/2018/10/word-embeddings/

-----

卅三、NNLMs

https://www.jiqizhixin.com/graph/technologies/c61ba3b9-40e2-4864-a941-9adc19e6792e

-----

References

[1] Word2vec v1

Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 (2013).

https://arxiv.org/pdf/1301.3781.pdf

[2] Word2vec v2

Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems. 2013.

https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf

[3] Word2vec v3

Rong, Xin. "word2vec parameter learning explained." arXiv preprint arXiv:1411.2738 (2014).

https://arxiv.org/pdf/1411.2738.pdf

[4] GloVe

Pennington, Jeffrey, Richard Socher, and Christopher Manning. "Glove: Global vectors for word representation." Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014.

https://www.aclweb.org/anthology/D14-1162

[5] fastText v1

Joulin, Armand, et al. "Bag of tricks for efficient text classification." arXiv preprint arXiv:1607.01759 (2016).

https://arxiv.org/pdf/1607.01759.pdf

[6] fastText v2
Bojanowski, Piotr, et al. "Enriching word vectors with subword information." Transactions of the Association for Computational Linguistics 5 (2017): 135-146.
https://www.mitpressjournals.org/doi/pdfplus/10.1162/tacl_a_00051

[7] WordRank
Ji, Shihao, et al. "Wordrank: Learning word embeddings via robust ranking." arXiv preprint arXiv:1506.02761 (2015).
https://arxiv.org/pdf/1506.02761.pdf

The Star Also Rises

Monday, August 30, 2021

高雄小旅行（一）：88 - 高屏溪

Thursday, August 19, 2021

Imitation Learning

Saturday, August 14, 2021

Word2vec（三）：Illustrated

Friday, August 13, 2021

Word2vec（二）：Overview

06C_Word2vec

Programmer

Blog Archive

Labels

Recent Comments

My Blog List

MY LINKS

status

About Me