Monday, September 23, 2019

AI 從頭學(三二):VGGNet

AI 從頭學(三二):VGGNet

2019/09/03

前言:



-----

Summary:






-----

Outline

1.1. Top 5 Accuracy
1.2. Evolution of CNN

2.1. Structure of VGG
2.2. Conv3
2.3. Pooling

3.1. Single Scale Evaluation
3.2. Multi-Scale Evaluation
3.3. Multi-crop Evalution
3.4. ConvNet Fusion

4.1. Deep
4.2. Conv1
4.3. LRN

-----

1.1. Top 5 Accuracy

-----


Fig. 1.1. Top 5 Accuracy [6]。

-----

卷積神經網路主要的功用,是判斷圖片的類別,通常以 Top 5 或 Top 1 為模型準確率的判斷標準。Top 5 是前五名其中之一正確即可列為模型判斷成功,Top 1 則只取第一名。

-----

1.2. Evolution of CNN

-----


Fig. 1.2a. Evolution of CNN [3]。

-----

圖 2.1 可以看到,CNN 的發展路線是,網路越來越深,而錯誤率越來越小。2012 年的 AlexNet 是 8 層,錯誤率是 16.4。2013 AlexNet 微調版的 ZFNet,錯誤率是 11.7。2014 年的 VGGNet 是 19 層,錯誤率是 7.3。同年的 GoogLeNet 是 22 層,錯誤率是 6.7。到了 2015 年的 ResNet 是 152 層,錯誤率是 3.57,一舉超過人類專家。

-----


Fig. 1.2b. Evolution of CNN [9]。

-----


Fig. 1.2c. Evolution of CNN [10]。

-----

是 CNN 的發展歷史。

1998,LeNet 奠定了成熟的 CNN 架構。
2012,AlexNet 擴大 LeNet 的架構,並使用 GPU,成為第一個在大型圖片資料集表現優異的 CNN。
2013,NIN 發展了 1x1 convolution(conv1)。
2014,GoogLeNet(Inception V1)基於 conv1 成功地加深網路。
2014,VGGNet 使用兩個 conv3 組成 conv5,也成功地加深網路。
2015,ResNet 運用 LSTM 的直通架構,一舉將網路加到極深。

CNN 的成功,也連帶引起 Object Detection 與 Semantic Segmentation 種種應用的風行,參考圖 2.2b。

-----


Fig. 1.2d. LeNet [4]。

-----


Fig. 1.2e. AlexNet [4]。

-----


Fig. 1.2f. VGGNet [4]。

-----


-----

2.1. Structure of VGG

-----


Fig. 2.1. VGGNet Architecture [1]。

-----

2.2. Conv3

-----


Fig. 2.2. Conv3 [5]。

-----

2.3. Pooling

-----


Fig. 2.3. Pooling [16]。

-----

3.1. Single Scale Evaluation

-----


Fig. 3.1a. Single Scale [5]。

-----

-----


Fig. 3.1b. ConvNet performance at a single test scale [1]。

-----


Fig. 3.1c. ConvNet performance at a single test scale [5]。

-----


-----

3.2. Multi-Scale Evaluation

-----


Fig. 3.2a. Multi-Scale [16]。

-----


Fig. 3.2b. ConvNet performance at multiple test scales [1]。

-----


Fig. 3.2c. ConvNet performance at multiple test scales [5]。

-----

3.3. Multi-crop Evaluation

-----


Fig. 3.3a. Multi-crop [16]。

-----


Fig. 3.3b. Dense [11]。

-----


Fig. 3.3d. Dense [5]。

-----


Fig. 3.3d. OverFeat [13]。

-----


Fig. 3.3e. ConvNet evaluation techniques comparison [1]。

-----


Fig. 3.3f. ConvNet evaluation techniques comparison [5]。

-----

3.4. ConvNet Fusion

-----


Fig. 3.4a. Multiple ConvNet fusion results [1]。

-----


Fig. 3.4a. Multiple ConvNet fusion results [5]。

-----

4.1. Deep

-----


Fig. 4.1a. Conv3 [12]。

-----


Fig. 4.1b. VGGNet [3]。

-----

-----

-----

4.2. Conv1

-----


Fig. 4.2. Conv1 [15]。

-----

4.3. LRN 

-----


Fig. 4.3. LRN [7]。

-----

結論:

閱讀論文的要點,除了理解其架構,另外要注意的是論文中提到,訓練模型的方法,即 weight decay 與 momentum。

-----

References

◎ 論文

[1] VGGNet
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
https://arxiv.org/pdf/1409.1556.pdf

[2] PreVGGNet
Ciresan, Dan C., et al. "Flexible, high performance convolutional neural networks for image classification." IJCAI Proceedings-International Joint Conference on Artificial Intelligence. Vol. 22. No. 1. 2011.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.481.4406&rep=rep1&type=pdf

-----

◎ 英文參考資料

# 綜述
# 3.9K claps
[3] CNN Architectures  LeNet, AlexNet, VGG, GoogLeNet, ResNet and more …
https://medium.com/@sidereal/cnns-architectures-lenet-alexnet-vgg-googlenet-resnet-and-more-666091488df5

# 綜述
# 1.1K claps
[4] Illustrated  10 CNN Architectures - Towards Data Science
https://towardsdatascience.com/illustrated-10-cnn-architectures-95d78ace614d

# VGGNet
# 246 claps
[5] Review  VGGNet — 1st Runner-Up (Image Classification), Winner (Localization) in ILSVRC 2014
https://medium.com/coinmonks/paper-review-of-vggnet-1st-runner-up-of-ilsvlc-2014-image-classification-d02355543a11

# VGGNet
[6] Convolutional neural networks on the iPhone with VGGNet
http://machinethink.net/blog/convolutional-neural-networks-on-the-iphone-with-vggnet/

# LRN
[7] What is local response normalization  - Quora
https://www.quora.com/What-is-local-response-normalization

# LRN
# 135 claps
[8] Difference between Local Response Normalization and Batch Normalization
https://towardsdatascience.com/difference-between-local-response-normalization-and-batch-normalization-272308c034ac

-----

◎ 簡體中文參考資料

# 綜述
[9] 深度学习之四大经典CNN技术浅析 _ 硬创公开课 _ 雷锋网
https://www.leiphone.com/news/201702/dgpHuriVJHTPqqtT.html

# 綜述
[10] GitHub - weslynn_AlphaTree-graphic-deep-neural-network  将深度神经网络中的一些模型 进行统一的图示,便于大家对模型的理解
https://github.com/weslynn/AlphaTree-graphic-deep-neural-network

# VGGNet 
[11] 大话CNN经典模型:VGGNet - 雪饼的个人空间 - OSCHINA
https://my.oschina.net/u/876354/blog/1634322

# VGGNet
[12] 【论文阅读】—— VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION _ Nameless rookie
http://vincentho.name/2018/11/29/%E3%80%90%E8%AE%BA%E6%96%87%E9%98%85%E8%AF%BB%E3%80%91%E2%80%94%E2%80%94-VERY-DEEP-CONVOLUTIONAL-NETWORKS-FOR-LARGE-SCALE-IMAGE-RECOGNITION/

# OverFeat
[13] OverFeat Integrated Recognition, Localization and Detection using Convolutional Networks - baobei0112的专栏 - CSDN博客
https://blog.csdn.net/baobei0112/article/details/47775647

# PreVGGNet
[14] 深度学习论文理解3:Flexible, high performance convolutional neural networks for image classification - whiteinblue的专栏 - CSDN博客
https://blog.csdn.net/whiteinblue/article/details/43149363

# Conv1
[15] CNN网络中的 1 x 1 卷积是什么? - AI小作坊 的博客 - CSDN博客
https://blog.csdn.net/zhangjunhit/article/details/55101559

-----
 
◎ 繁體中文參考資料

# VGGNet
[16] VGG_深度學習_原理 – JT – Medium
https://medium.com/@danjtchen/vgg-%E6%B7%B1%E5%BA%A6%E5%AD%B8%E7%BF%92-%E5%8E%9F%E7%90%86-d31d0aa13d88

# PreVGGNet
[17] [Pytorch Taipei] Paper  Flexible, high performance convolutional neural networks for image classification
https://medium.com/@ChrisChou0426/pytorch-taipei-paper-flexible-high-performance-convolutional-neural-networks-for-image-4153f9495113 

-----

◎ 代碼實作

# PyTorch
[18] torchvision.models.vgg — PyTorch master documentation
https://pytorch.org/docs/stable/_modules/torchvision/models/vgg.html

# PyTorch
[19] vision_vgg.py at master · pytorch_vision · GitHub
https://github.com/pytorch/vision/blob/master/torchvision/models/vgg.py  

# PyTorch
[20] 简单易懂Pytorch实战实例VGG深度网络 - 心之所向 - CSDN博客
https://blog.csdn.net/qq_16234613/article/details/79818370

AI 從頭學(三三):ResNet

AI 從頭學(三三):ResNet

2017/08/03

-----

前言:

-----



-----

-----




-----


-----

Summary:




-----



-----




-----


-----



-----


-----



-----




-----

-----

References

◎ 論文 

[1] The Power of Depth
Eldan, Ronen, and Ohad Shamir. "The power of depth for feedforward neural networks." Conference on Learning Theory. 2016.
http://proceedings.mlr.press/v49/eldan16.pdf

[2] ResNet
He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
http://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf

[3] ResNet v2
He, Kaiming, et al. "Identity mappings in deep residual networks." European Conference on Computer Vision. Springer, Cham, 2016.
https://arxiv.org/pdf/1603.05027.pdf 

[4] Degeneracy
Orhan, A. Emin, and Xaq Pitkow. "Skip connections eliminate singularities." arXiv preprint arXiv:1701.09175 (2017).
https://arxiv.org/pdf/1701.09175.pdf

-----

◎ 英文參考資料

[2] Understanding and Implementing Architectures of ResNet and ResNeXt for state-of-the-art Image…
https://medium.com/@14prakash/understanding-and-implementing-architectures-of-resnet-and-resnext-for-state-of-the-art-image-cf51669e1624 

[3] Residual blocks — Building blocks of ResNet – Towards Data Science
https://towardsdatascience.com/residual-blocks-building-blocks-of-resnet-fd90ca15d6ec

[4] An Overview of ResNet and its Variants – Towards Data Science
https://towardsdatascience.com/an-overview-of-resnet-and-its-variants-5281e2f56035 

[5] Why is it hard to train deep neural networks  Degeneracy, not vanishing gradients, is the key _ Severely Theoretical
https://severelytheoretical.wordpress.com/2018/01/01/why-is-it-hard-to-train-deep-neural-networks-degeneracy-not-vanishing-gradients-is-the-key/

-----

◎ 簡體中文參考資料

你必须要知道CNN模型:ResNet - 知乎
https://zhuanlan.zhihu.com/p/31852747

# 模型退化
[6] 【模型解读】resnet中的残差连接,你确定真的看懂了? - 知乎
https://zhuanlan.zhihu.com/p/42833949 

-----

◎ 繁體中文參考資料





-----

◎ 代碼實作



NLP(一):LSTM

NLP(一):LSTM

2019/09/06

說明:

Recurrent Neural Network (RNN) 跟 Long Short-Term Memory (LSTM) [1]-[3] 都是用來處理時間序列的訊號,譬如 Audio、Speech、Language [4], [5]。由於 RNN 有梯度消失與梯度爆炸的問題,所以 LSTM 被開發出來取代 RNN。由於本質上的缺陷(不能使用 GPU 平行加速),所以雖然 NLP 原本使用 LSTM、GRU 等開發出來的語言模型如 Seq2seq、Attention 等,最後也捨棄了 RNN 系列,而改用全連接層為主的 Transformer,並且取得很好的成果 [8]-[10]。即便如此,還是有更新的 RNN 模型譬如 MGU、SRU 被提出 [11]。

-----



Fig. 1. RNN, [1].

-----


Fig. 2. LSTM, [1].

-----

References

[1] Understanding LSTM Networks -- colah's blog
http://colah.github.io/posts/2015-08-Understanding-LSTMs/ 

[2] The Unreasonable Effectiveness of Recurrent Neural Networks
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
 
# 10.8K claps
[3] The fall of RNN _ LSTM – Towards Data Science
https://towardsdatascience.com/the-fall-of-rnn-lstm-2d1594c74ce0

[4] Written Memories  Understanding, Deriving and Extending the LSTM - R2RT
https://r2rt.com/written-memories-understanding-deriving-and-extending-the-lstm.html

# 10.1K claps
[] Illustrated Guide to LSTM’s and GRU’s  A step by step explanation
https://towardsdatascience.com/illustrated-guide-to-lstms-and-gru-s-a-step-by-step-explanation-44e9eb85bf21
 
# 7.8K claps
[] Understanding LSTM and its diagrams - ML Review - Medium
https://medium.com/mlreview/understanding-lstm-and-its-diagrams-37e2f46f1714

# 386 claps
[] The magic of LSTM neural networks - DataThings - Medium
https://medium.com/datathings/the-magic-of-lstm-neural-networks-6775e8b540cd
 
# 247 claps
[] Recurrent Neural Networks and LSTM explained - purnasai gudikandula - Medium
https://medium.com/@purnasaigudikandula/recurrent-neural-networks-and-lstm-explained-7f51c7f6bbb9

# 113 claps
[] A deeper understanding of NNets (Part 3) — LSTM and GRU
https://medium.com/@godricglow/a-deeper-understanding-of-nnets-part-3-lstm-and-gru-e557468acb04

# 58 claps
[] Basic understanding of LSTM - Good Audience
https://blog.goodaudience.com/basic-understanding-of-lstm-539f3b013f1e

-----

# 梯度消失 梯度爆炸
[] 三次简化一张图:一招理解LSTM_GRU门控机制 _ 机器之心
https://www.jiqizhixin.com/articles/2018-12-18-12

# 梯度消失 梯度爆炸
[] 长短期记忆(LSTM)-tensorflow代码实现 - Jason160918的博客 - CSDN博客
https://blog.csdn.net/Jason160918/article/details/78295423

[] 周志华等提出 RNN 可解释性方法,看看 RNN 内部都干了些什么 _ 机器之心
https://www.jiqizhixin.com/articles/110404 

-----

[] 遞歸神經網路和長短期記憶模型 RNN & LSTM · 資料科學・機器・人
https://brohrer.mcknote.com/zh-Hant/how_machine_learning_works/how_rnns_lstm_work.html

# 593 claps
[] 淺談遞歸神經網路 (RNN) 與長短期記憶模型 (LSTM) - TengYuan Chang - Medium
https://medium.com/@tengyuanchang/%E6%B7%BA%E8%AB%87%E9%81%9E%E6%AD%B8%E7%A5%9E%E7%B6%93%E7%B6%B2%E8%B7%AF-rnn-%E8%88%87%E9%95%B7%E7%9F%AD%E6%9C%9F%E8%A8%98%E6%86%B6%E6%A8%A1%E5%9E%8B-lstm-300cbe5efcc3

# 405 claps
[] 速記AI課程-深度學習入門(二) - Gimi Kao - Medium
https://medium.com/@baubibi/%E9%80%9F%E8%A8%98ai%E8%AA%B2%E7%A8%8B-%E6%B7%B1%E5%BA%A6%E5%AD%B8%E7%BF%92%E5%85%A5%E9%96%80-%E4%BA%8C-954b0e473d7f



-----

[] 深入淺出 Deep Learning(三):RNN (LSTM)
http://hemingwang.blogspot.com/2018/02/airnnlstmin-120-mins.html

[] AI從頭學(一九):Recurrent Neural Network
http://hemingwang.blogspot.com/2017/03/airecurrent-neural-network.html 

-----

◎ 代碼實作

[] Sequence Models and Long-Short Term Memory Networks — PyTorch Tutorials 1.2.0 documentation
https://pytorch.org/tutorials/beginner/nlp/sequence_models_tutorial.html

Predict Stock Prices Using RNN  Part 1
https://lilianweng.github.io/lil-log/2017/07/08/predict-stock-prices-using-RNN-part-1.html

Predict Stock Prices Using RNN  Part 2
https://lilianweng.github.io/lil-log/2017/07/22/predict-stock-prices-using-RNN-part-2.html

# 230 claps
[] [Keras] 利用Keras建構LSTM模型,以Stock Prediction 為例 1 - PJ Wang - Medium
https://medium.com/@daniel820710/%E5%88%A9%E7%94%A8keras%E5%BB%BA%E6%A7%8Blstm%E6%A8%A1%E5%9E%8B-%E4%BB%A5stock-prediction-%E7%82%BA%E4%BE%8B-1-67456e0a0b

# 38 claps
[] LSTM_深度學習_股價預測 - Data Scientists Playground - Medium
https://medium.com/data-scientists-playground/lstm-%E6%B7%B1%E5%BA%A6%E5%AD%B8%E7%BF%92-%E8%82%A1%E5%83%B9%E9%A0%90%E6%B8%AC-cd72af64413a

NLP(二):Seq2seq

NLP(二):Seq2seq

2019/01/02

施工中...

-----

說明:

-----


Fig. 1. Seq2seq [].

-----

-----




Fig. 1. seq2seq [1].


符號說明:

# basic
// advanced

-----

# Hidden state
「For vanilla recurrent neural networks and GRU’s, the output is the hidden state. When you see both an output and a hidden state represented as two different variables, usually the output is something that ran through some sort of activation like softmax for classification.」
//  Difference between output and hidden state in RNN   MLQuestions

-----


「輸入一句英文,輸出一句法文,就寫好了一個翻譯系統。
輸入一個問題,輸出一句回覆,就架好一個聊天機器人。
輸入一篇文章,輸出一份總結,就構成一個摘要系統。
輸入幾個關鍵字,輸出一首短詩,就成就了一名詩人。」
// 從零開始的 Sequence to Sequence _ 雷德麥的藏書閣

Sequence to Sequence 是由 Encoder 與 Decoder 兩個 RNN 構成,它的運作原理其實與人類的思維很相似,當我們看到一段話時,會先將這句話理解吸收,再根據我們理解的內容說出回覆,Sequence to Sequence 就是在模擬這個過程。
// 從零開始的 Sequence to Sequence _ 雷德麥的藏書閣

而Seq2seq靈活的架構,也讓這個模型被廣泛地應用到各種不同的任務上:例如ChatBot、Google Inbox的Auto-Reply。只要你有一個配對好的文本集(問與答、信件與回覆、圖片與描述),就可以把資料餵進模型裡訓練產生一個seq2seq系統。
// 教電腦寫作:AI球評——Seq2seq模型應用筆記(PyTorch + Python3) – Yi-Hsiang Kao – Medium

要知道,在以往的很多模型中,我們一般都說輸入特徵矩陣,每個樣本對應矩陣中的某一行,就是說,無論是第一個樣本還是最後一個樣本,他們都有一樣的特徵維度。但是對於翻譯這種例子,難道我們要讓每一句話都有一樣的字數嗎,那樣的話估計五言律詩和七言絕句又能大火一把了,哈哈。但是這不科學呀,所以就有了 seq2seq 這種結構。
// seq2seq学习笔记 - 大学之道,在明明德 - CSDN博客

現在我們具備RNN/LSTM的知識,可以發現Seq2seq中,Decoder的公式和RNN根本就是同一個模子出來的,差別在於Decoder多了一個C — 圖(6),這個C是指context vector/thought vector。context vector 可以想成是一個含有所有輸入句訊息的向量,也就是Encoder當中,最後一個hidden state。簡單來說,Encoder將輸入句壓縮成固定長度的context vector,context vector即可完整表達輸入句,再透過Decoder 將context vector內的訊息產生輸出句,如圖7。
//

為什麼要用 attention model? The attention model 用來幫助解決機器翻譯在句子過長時效果不佳的問題。 這種新的架構替輸入句的每個文字都創造一個context vector,而非僅僅替輸入句創造一個從最終的hidden state得來的context vector,舉例來說,如果一個輸入句有N個文字,就會產生N個context vector,好處是,每個context vector能夠被更有效的解碼。
//

-----

References

◎ 論文

[1] Seq2seq - using LSTM
Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." Advances in neural information processing systems. 2014.
http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf 

[2] RNN Encoder-Decoder 1 - using GRU
Cho, Kyunghyun, et al. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." arXiv preprint arXiv:1406.1078 (2014).
https://arxiv.org/pdf/1406.1078.pdf

[3] RNN Encoder Decoder 2 - using GRU
Cho, Kyunghyun, et al. "On the properties of neural machine translation: Encoder-decoder approaches." arXiv preprint arXiv:1409.1259 (2014).
https://arxiv.org/pdf/1409.1259.pdf

-----

◎ 英文參考資料

# 921 claps
Seq2Seq model in TensorFlow - Towards Data Science
https://towardsdatascience.com/seq2seq-model-in-tensorflow-ec0c557e560f

# 694 claps
Word Level English to Marathi Neural Machine Translation using Encoder-Decoder Model
https://towardsdatascience.com/word-level-english-to-marathi-neural-machine-translation-using-seq2seq-encoder-decoder-lstm-model-1a913f2dc4a7

# 77 claps
Introduction to RNNs, Sequence to Sequence Language Translation and Attention
https://towardsdatascience.com/introduction-to-rnns-sequence-to-sequence-language-translation-and-attention-fc43ef2cc3fd

# 33 claps
Tuned version of seq2seq tutorial - Towards Data Science
https://towardsdatascience.com/tuned-version-of-seq2seq-tutorial-ddb64db46e2

Difference between output and hidden state in RNN   MLQuestions
https://www.reddit.com/r/MLQuestions/comments/9hpkc4/difference_between_output_and_hidden_state_in_rnn/



-----

◎ 簡體中文參考資料

[22] seq2seq学习笔记 - 大学之道,在明明德 - CSDN博客
https://blog.csdn.net/Jerr__y/article/details/53749693

-----

◎ 繁體中文參考資料

[19] 從零開始的 Sequence to Sequence _ 雷德麥的藏書閣
http://zake7749.github.io/2017/09/28/Sequence-to-Sequence-tutorial/

[20] 教電腦寫作:AI球評——Seq2seq模型應用筆記(PyTorch + Python3) – Yi-Hsiang Kao – Medium
https://medium.com/@gau820827/%E6%95%99%E9%9B%BB%E8%85%A6%E5%AF%AB%E4%BD%9C-ai%E7%90%83%E8%A9%95-seq2seq%E6%A8%A1%E5%9E%8B%E6%87%89%E7%94%A8%E7%AD%86%E8%A8%98-pytorch-python3-31e853573dd0 

[21] Pytorch Seq2Seq 篇
https://fgc.stpi.narl.org.tw/activity/videoDetail/4b1141305df38a7c015e194f22f8015b

Seq2seq pay Attention to Self Attention  Part 1(中文版)
https://medium.com/@bgg/seq2seq-pay-attention-to-self-attention-part-1-%E4%B8%AD%E6%96%87%E7%89%88-2714bbd92727

-----

◎ 代碼實作

Deploying a Seq2Seq Model with the Hybrid Frontend — PyTorch Tutorials 1.0.0.dev20181228 documentation
https://pytorch.org/tutorials/beginner/deploy_seq2seq_hybrid_frontend_tutorial.html 

[23] Seq2Seq – Long – Medium
https://medium.com/@Aj.Cheng/seq2seq-18a0730d1d77

[24] Seq2Seq model in TensorFlow – Towards Data Science
https://towardsdatascience.com/seq2seq-model-in-tensorflow-ec0c557e560f 

[25] A ten-minute introduction to sequence-to-sequence learning in Keras
https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html

[26] Write a Sequence to Sequence (seq2seq) Model — Chainer 5.0.0 documentation
https://docs.chainer.org/en/stable/examples/seq2seq.html

NLP(三):Attention

NLP(三):Attention

2019/01/18

-----


-----



-----



-----








-----


Fig. 2. Attention [2].

-----

References

◎ 論文

[1] Attention - using GRU
Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. "Neural machine translation by jointly learning to align and translate." arXiv preprint arXiv:1409.0473 (2014).
https://arxiv.org/pdf/1409.0473.pdf

[2] Global Attention - using LSTM
Luong, Minh-Thang, Hieu Pham, and Christopher D. Manning. "Effective approaches to attention-based neural machine translation." arXiv preprint arXiv:1508.04025 (2015).
https://arxiv.org/pdf/1508.04025.pdf 

[3] Visual Attention
Xu, Kelvin, et al. "Show, attend and tell: Neural image caption generation with visual attention." International conference on machine learning. 2015.
http://proceedings.mlr.press/v37/xuc15.pdf

-----

◎ 英文參考資料

# 1.4K claps
Attn  Illustrated Attention - Towards Data Science
https://towardsdatascience.com/attn-illustrated-attention-5ec4ad276ee3

# 1.3K claps
A Brief Overview of Attention Mechanism - SyncedReview - Medium
https://medium.com/syncedreview/a-brief-overview-of-attention-mechanism-13c578ba9129

# 799 claps
Intuitive Understanding of Attention Mechanism in Deep Learning
https://towardsdatascience.com/intuitive-understanding-of-attention-mechanism-in-deep-learning-6c9482aecf4f

# 680 claps
Attention in NLP – Kate Loginova – Medium
https://medium.com/@joealato/attention-in-nlp-734c6fa9d983

# 126 claps
Understanding Attention Mechanism - Shashank Yadav - Medium
https://medium.com/@shashank7.iitd/understanding-attention-mechanism-35ff53fc328e

Attention and Memory in Deep Learning and NLP – WildML
http://www.wildml.com/2016/01/attention-and-memory-in-deep-learning-and-nlp/

-----

◎ 日文參考資料

深層学習による自然言語処理 - RNN, LSTM, ニューラル機械翻訳の理論 - ディープラーニングブログ
http://deeplearning.hatenablog.com/entry/neural_machine_translation_theory

PyTorch 1.1 Tutorials   テキスト   Sequence to Sequence ネットワークと Attention で翻訳 – PyTorch
http://torch.classcat.com/2019/07/20/pytorch-1-1-tutorials-text-seq2seq-translation/

-----

◎ 簡體中文參考資料

自然语言处理中的Attention Model:是什么及为什么 - 张俊林的博客 - CSDN博客
https://blog.csdn.net/malefactor/article/details/50550211

目前主流的attention方法都有哪些? - 知乎
https://www.zhihu.com/question/68482809/answer/264632289

# 110 claps
自然语言处理中注意力机制综述 - 知乎
https://zhuanlan.zhihu.com/p/54491016 

# 12 claps
一篇了解NLP中的注意力机制 - 知乎
https://zhuanlan.zhihu.com/p/59837917

注意力机制(Attention Mechanism)在自然语言处理中的应用 - Soul Joy Hub - CSDN博客
https://blog.csdn.net/u011239443/article/details/80418489

【NLP】Attention Model(注意力模型)学习总结 - 郭耀华 - 博客园
https://www.cnblogs.com/guoyaohua/p/9429924.html
 
-----

◎ 繁體中文參考資料

# 486 claps
[1] Seq2seq pay Attention to Self Attention  Part 1(中文版)
https://medium.com/@bgg/seq2seq-pay-attention-to-self-attention-part-1-%E4%B8%AD%E6%96%87%E7%89%88-2714bbd92727

-----

◎ 代碼實作

[4] Translation with a Sequence to Sequence Network and Attention — PyTorch Tutorials 1.0.0.dev20181228 documentation
https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html

NLP(四):ConvS2S

NLP(四):ConvS2S

2019/04/02

-----


Fig. 1. ConvS2S [1]。

-----



-----




-----



-----


-----




-----



-----



-----

References

◎ 論文

# ConvS2S
Gehring, Jonas, et al. "Convolutional sequence to sequence learning." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017.
https://arxiv.org/pdf/1705.03122.pdf

# GLU
Dauphin, Yann N., et al. "Language modeling with gated convolutional networks." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017.
https://arxiv.org/pdf/1612.08083.pdf

-----

◎ 英文參考資料

# 論文作者的投影片
2_1-Yarats
https://aiukraine.com/wp-content/uploads/2017/10/2_1-Yarats.pdf

# 綜述
Understanding incremental decoding in fairseq – Telesens
http://www.telesens.co/2019/04/21/understanding-incremental-decoding-in-fairseq/

claps
Seq2Seq model using Convolutional Neural Network – Gautam Karmakar – Medium
https://medium.com/@gautam.karmakar/summary-seq2seq-model-using-convolutional-neural-network-b1eb100fb4c4

-----

◎ 日文參考資料

論文解説 Convolutional Sequence to Sequence Learning (ConvS2S) - ディープラーニングブログ
http://deeplearning.hatenablog.com/entry/convs2s 

fairseq – PyTorch
http://torch.classcat.com/category/fairseq/

-----

◎ 韓文參考資料

ConvS2S  Convolutional Sequence to Sequence Learning
https://reniew.github.io/44/

Convolutional Sequence to Sequence Learning
https://norman3.github.io/papers/docs/fairseq.html

Convolutional Sequence to Sequence Learning
http://jeonseoungseon.blogspot.com/2017/06/convolutional-sequence-to-sequence.html

-----

◎ 簡體中文參考資料

Convolutional Sequence to Sequence Learning _ Apathy
https://dzapathy.github.io/2019/02/14/fairseq/

从《Convolutional Sequence to Sequence Learning》到《Attention Is All You Need》 - 知乎
https://zhuanlan.zhihu.com/p/27464080

《Convolutional Sequence to Sequence Learning》阅读笔记 - 知乎
https://zhuanlan.zhihu.com/p/26918935

(模型汇总-7)基于CNN的Seq2Seq模型-Convolutional Sequence ... - 简书
https://www.jianshu.com/p/0f2396f0d98f

机器翻译模型之Fairseq:《Convolutional Sequence to Sequence Learning》 - 技术成长笔记 - CSDN博客
https://blog.csdn.net/u012931582/article/details/83719158

如何使用fairseq复现Transformer NMT _ Weekly Review
http://www.linzehui.me/2019/01/28/%E7%A2%8E%E7%89%87%E7%9F%A5%E8%AF%86/%E5%A6%82%E4%BD%95%E4%BD%BF%E7%94%A8fairseq%E5%A4%8D%E7%8E%B0Transformer%20NMT/

如何评价 Facebook 新推出的 CNN 机器翻译项目 Fairseq  - 知乎
https://www.zhihu.com/question/59645329

论文笔记(1) ConvS2S Convolutional Seq to Seq Learning - 知乎
https://zhuanlan.zhihu.com/p/60524073

-----

◎ 繁體中文參考資料

# 簡介
淺談Facebook最新CNN神經翻譯機 Convolutional Sequence to Sequence Learning _ Learning by Hacking
https://data-sci.info/2017/05/11/%E6%B7%BA%E8%AB%87facebook%E6%9C%80%E6%96%B0cnn%E7%A5%9E%E7%B6%93%E7%BF%BB%E8%AD%AF%E6%A9%9F/

論文筆記 Convolutional Sequence to Sequence Learning _ Y.C. Tseng’s Site
https://ycts.github.io/weeklypapers/convSeq2seq/ 

# Word2vec
類神經網路 -- word2vec (part 1   Overview) « MARK CHANG'S BLOG
http://cpmarkchang.logdown.com/posts/773062-neural-network-word2vec-part-1-overview

-----

◎ 代碼實作

convs2s — OpenSeq2Seq 0.2 documentation
https://nvidia.github.io/OpenSeq2Seq/html/api-docs/parts.convs2s.html

GitHub - hongweizeng_cnn-seq2seq
https://github.com/hongweizeng/cnn-seq2seq

XSum_XSum-ConvS2S at master · EdinburghNLP_XSum · GitHub
https://github.com/EdinburghNLP/XSum/tree/master/XSum-ConvS2S

GitHub - BruceChaun_NMT  Neural Machine Translation with RNN_ConvS2S_Transoformer
https://github.com/BruceChaun/NMT

GitHub - tobyyouup_conv_seq2seq  A tensorflow implementation of Fairseq Convolutional Sequence to Sequence Learning(Gehring et al. 2017)
https://github.com/tobyyouup/conv_seq2seq

GitHub - pytorch_fairseq  Facebook AI Research Sequence-to-Sequence Toolkit written in Python
https://github.com/pytorch/fairseq 

GitHub - facebookresearch_fairseq  Facebook AI Research Sequence-to-Sequence Toolkit
https://github.com/facebookresearch/fairseq