Thursday, November 19, 2020

What's the main points of Attention?

What's the main points of Attention?

2020/10/27

-----


-----

一、基礎學習與論文理解(70%)。


◎ 1. 可以從這篇論文學到什麼(解決什麼問題)(巨觀)? 

◎ 1. 可以從這篇論文學到什麼(解決它之前論文未解決的什麼問題)(微觀)? 



# 机器翻译的技术进化史——机器翻译专题(一) - 云+社区 - 腾讯云

https://cloud.tencent.com/developer/news/16139

-----


論文圖一,Seq2seq 的架構 [2]。

-----

◎ A. 問題原因。 

◎ 1.a.1:過往(本篇論文之前)這個領域已經做到甚麼程度?

-----


Attention 2,與 Attention 1 同期。

-----

◎ 1.a.1:過往(本論文稍後)這個領域已經做到甚麼程度?



GNMT,Attention 的多層 RL 版本。

-----

◎ 1.a.2:(本論文之前,離本論文)最近的研究遇到了甚麼瓶頸?



# Seq2seq [1]。

-----

◎ 1.a.3:議題發生的根本原因是甚麼?



https://medium.com/@joealato/attention-in-nlp-734c6fa9d983

-----

◎ B. 解決方法。 

◎ 1.b.1:作者採用甚麼方式解決?



# Attention [6]。

-----

◎ 1.b.2:細節內容是如何解決的?

-----


Fig. 2. An illustration of the attention mechanism (RNNSearch) proposed by [Bahdanau, 2014]. Instead of converting the entire input sequence into a single context vector, we create a separate context vector for each output (target) word. These vectors consist of the weighted sums of encoder’s hidden states.

-----

http://hemingwang.blogspot.com/2019/01/attention.html

-----

◎ 1.b.3:(optional)- 作者是否有說明思路? 



The context vector ci depends on a sequence of annotations (h1, ... hTx ) to which an encoder maps the input sentence. Each annotation hi contains information about the whole input sequence with a strong focus on the parts surrounding the i-th word of the input sequence. We explain in detail how the annotations are computed in the next section.

-----

- 或是後續研究者的討論?



Sequence to sequence modeling has been synonymous with recurrent neural network based encoder-decoder architectures (Sutskever et al., 2014; Bahdanau et al., 2014). The encoder RNN processes an input sequence x = (x1, . . . , xm) of m elements and returns state representations z = (z1, . . . , zm). The decoder RNN takes z and generates the output sequence y = (y1, . . . , yn) left to right, one element at a time. To generate output yi+1, the decoder computes a new hidden state hi+1 based on the previous state hi, an embedding gi of the previous target language word yi, as well as a conditional input ci derived from the encoder output z. Based on this generic formulation, various encoder-decoder architectures have been proposed, which differ mainly in the conditional input and the type of RNN.

ConvS2S 有,Transformer 沒有。

-----


# Attention [6]。

-----

◎ C. 效能評估。 

◎ 1.c.1:成果效能的比較。



-----

◎ 1.c.2:目前這個方法是否還有限制,是甚麼?


◎ 1.c.3:(optional)- 作者對後續發展的期許? 

-----



One of challenges left for the future is to better handle unknown, or rare words. This will be required for the model to be more widely used and to match the performance of current state-of-the-art machine translation systems in all contexts.

-----

- 其他研究者後續的發展?


# GNMT

-----


# GNMT

-----

二、後續發展與延伸應用(以論文為例)(30%)

◎ 2. 可以應用在那些垂直領域(應用領域)? 



# HAN。

-----

◎ 3. 這篇論文的價值在哪(如何跨領域延伸應用)? 


SAT

-----


ST

-----

◎ 4. 如果要改進可以怎麼做(後續的研究)?



# ConvS2S [4]。

-----

References

◎ 主要論文

[1] LSTM。被引用 39743 次。

Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9.8 (1997): 1735-1780.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.676.4320&rep=rep1&type=pdf


[2] Seq2seq。被引用 12676 次。

Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." Advances in neural information processing systems. 2014.

http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf


[3] Attention 1 - Using GRU。被引用 14895 次。

Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. "Neural machine translation by jointly learning to align and translate." arXiv preprint arXiv:1409.0473 (2014).

https://arxiv.org/pdf/1409.0473.pdf


[4] ConvS2S。被引用 1772 次。

Gehring, Jonas, et al. "Convolutional sequence to sequence learning." arXiv preprint arXiv:1705.03122 (2017).

https://arxiv.org/pdf/1705.03122.pdf


[5] Transformer。被引用 13554 次。

Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems. 2017.

https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

-----

◎ 相關論文

-----

[] Attention 2 - Using LSTM。被引用 4688 次

Luong, Minh-Thang, Hieu Pham, and Christopher D. Manning. "Effective approaches to attention-based neural machine translation." arXiv preprint arXiv:1508.04025 (2015).

https://arxiv.org/pdf/1508.04025.pdf


[] GNMT。被引用 3391 次。

Wu, Yonghui, et al. "Google's neural machine translation system: Bridging the gap between human and machine translation." arXiv preprint arXiv:1609.08144 (2016).

https://arxiv.org/pdf/1609.08144.pdf

-----

# HAN。領域內應用。被引用 2596 次。

Yang, Zichao, et al. "Hierarchical attention networks for document classification." Proceedings of the 2016 conference of the North American chapter of the association for computational linguistics: human language technologies. 2016.

https://www.aclweb.org/anthology/N16-1174.pdf

-----

領域外應用

[3] SAT。Visual Attention 1。被引用 6040 次。

Xu, Kelvin, et al. "Show, attend and tell: Neural image caption generation with visual attention." International conference on machine learning. 2015.

http://proceedings.mlr.press/v37/xuc15.pdf

-----

[] ST。Visual Attention 2。被引用 4059 次。

Vinyals, Oriol, et al. "Show and tell: A neural image caption generator." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.

https://openaccess.thecvf.com/content_cvpr_2015/papers/Vinyals_Show_and_Tell_2015_CVPR_paper.pdf

-----

GAP NIN

Cam

GradCam

GradCam++

ScoreCam


◎ 英文

Attention in NLP. In this post, I will describe recent… | by Kate Loginova | Medium

https://medium.com/@joealato/attention-in-nlp-734c6fa9d983

-----

◎ 繁中

歐尼克斯實境互動工作室(OmniXRI): 【AI HUB專欄】如何利用可視化工具揭開神經網路背後的祕密(上)

https://omnixri.blogspot.com/2020/10/ai-hub_16.html

歐尼克斯實境互動工作室(OmniXRI): 【AI HUB專欄】如何利用可視化工具揭開神經網路背後的祕密(下)

https://omnixri.blogspot.com/2020/09/ai-hub_20.html


The Star Also Rises: NLP(三):Attention

http://hemingwang.blogspot.com/2019/01/attention.html

-----

No comments: