Seq2seq(三):Illustrated
2021/08/13
-----
https://pixabay.com/zh/photos/stock-trading-monitor-business-1863880/
-----
Figure 1: Our model reads an input sentence “ABC” and produces “WXYZ” as the output sentence. The model stops making predictions after outputting the end-of-sentence token. Note that the LSTM reads the input sentence in reverse, because doing so introduces many short term dependencies in the data that make the optimization problem much easier.
圖 1:我們的模型讀取輸入句子“ABC”並生成“WXYZ”作為輸出句子。 模型在輸出句尾標記後停止進行預測。 請注意,LSTM 反向讀取輸入句子,因為這樣做會在數據中引入許多短期依賴關係,從而使優化問題變得更加容易。
# Seq2seq 1
說明:
重點在,「LSTM 反向讀取輸入句子,因為這樣做會在數據中引入許多短期依賴關係。」
-----
The Recurrent Neural Network (RNN) [31, 28] is a natural generalization of feedforward neural networks to sequences. Given a sequence of inputs (x1, . . . , xT ), a standard RNN computes a sequence of outputs (y1, . . . , yT ) by iterating the following equation:
ht = sigm(Whxxt +Whhht−1)
yt = Wyhht
The RNN can easily map sequences to sequences whenever the alignment between the inputs the outputs is known ahead of time. However, it is not clear how to apply an RNN to problems whose input and the output sequences have different lengths with complicated and non-monotonic relationships.
循環神經網路 (RNN) [31, 28] 是前饋神經網路對序列的自然推廣。 給定輸入序列 (x1, . . . , xT ),標準 RNN 通過迭代以下等式來計算輸出序列 (y1, . . . , yT ):
ht = sigm(Whxxt +Whhht−1)
yt = yt = Wyhht
只要提前知道輸入和輸出之間的對齊,RNN 就可以輕鬆地將序列映射到序列。 然而,目前尚不清楚如何將 RNN 應用於輸入和輸出序列具有不同長度且關係複雜且非單調的問題。
說明:
本段說明 RNN 無法處理複雜且長度不同的輸入輸出句。
-----
說明:
v 為輸入句壓縮成的向量,以此向量配合已輸出的 y,得到目標字。
-----
說明:
最大化目標函數。
兩個 S 符號不大相同,一個是來源句,一個是訓練集。
https://marssu.coderbridge.io/2020/11/21/sequence-to-sequence-model/
-----
3.3 Reversing the Source Sentences
說明:
假定預測的正確率與距離成反比。來源句與目標句、反轉來源句與目標句,其平均距離相等。但反轉來源句的翻譯效能比來源句的翻譯效能要好,因其短期依賴較多。
-----
beam search
說明:
B = 1 時,選取每個機率最大的字。
B = 2 時,選取兩個機率最大的字,兩個分支再繼續選機率最大的字。最後以機率總和較高者當成結果。
https://blog.csdn.net/guolindonggld/article/details/79938567
https://blog.csdn.net/dupei/article/details/104837244
https://hackernoon.com/beam-search-a-search-strategy-5d92fb7817f
-----
# BLEU
-----
Bilingual Evaluation Understudy 雙語評估替補。understudy 在戲曲界指的是候補演員。
一、modified n-gram(Pn)統計預測句的 n-gram 出現在參考句中的次數,但每個 gram 的次數不得超出參考句該 gram 的次數。舉例,輸出句有 7 個 the,參考句有 2 個 the,輸出句只能算 2 個正確,不能算 7 個正確。。
二、BP:短句比較容易得高分,所以加一個懲罰項。
-----
-----
BLEU 計算範例
candidate:Going to play basketball this afternoon ?(含標點符號字數為 7)
reference:Going to play basketball in the afternoon ?(含標點符號字數為 8)
P1 = 6/7 = 0.857...(Going to play basketball afternoon ?)
P2 = 4/6 = 0.666...(Going to, to play, play basketball, afternoon ?)
P3 = 2/5 = 0.4(Going to play, to play basketball)
P4 = 1/4 = 0.25(Going to play basketball)
r = 8
c = 7
BLEU = 0.423
-----
https://www.cnblogs.com/by-dream/p/7679284.html
https://blog.csdn.net/qq_42067550/article/details/105957469
https://tw.answers.yahoo.com/question/index?qid=20080619000016KK03830
-----
Table 1: The performance of the LSTM on WMT’14 English to French test set (ntst14). Note that an ensemble of 5 LSTMs with a beam of size 2 is cheaper than of a single LSTM with a beam of size 12.
表 1:LSTM 在 WMT'14 英語到法語測試集 (ntst14) 上的性能。 請注意,具有大小為 2 的光束(用來定向搜尋)的 5 個 LSTM 的集成比具有大小為 12 的光束的單個 LSTM 更便宜。
Table 2: Methods that use neural networks together with an SMT system on the WMT’14 English to French test set (ntst14).
表 2:在 WMT'14 英語到法語測試集 (ntst14) 上使用神經網路和 SMT 系統的方法。
說明:
statistical machine translation (SMT)。
--
ensemble:不同的初始參數設定,再將結果投票。
https://medium.com/allenyummy-note/nlp-seq2seq-2014-7c9c5a9841db
-----
Figure 2: The figure shows a 2-dimensional PCA projection of the LSTM hidden states that are obtained after processing the phrases in the figures. The phrases are clustered by meaning, which in these examples is primarily a function of word order, which would be difficult to capture with a bag-of-words model. Notice that both clusters have similar internal structure.
圖 2:該圖顯示了處理圖中短語後獲得的 LSTM 隱藏狀態的二維 PCA 投影。 短語按含義聚類,在這些示例中,含義主要是詞序的函數,這很難用詞袋模型捕獲。 請注意,兩個集群具有相似的內部結構。
說明:
LSTM 優於詞袋模型之處,是句子順序(內在結構)可以分辨出來。
左圖:Encoder 對字的順序與關係敏感。
右圖:Encoder 對被動式不大敏感(?)。
https://medium.com/allenyummy-note/nlp-seq2seq-2014-7c9c5a9841db
-----
Table 3: A few examples of long translations produced by the LSTM alongside the ground truth translations. The reader can verify that the translations are sensible using Google translate.
表 3:LSTM 生成的長翻譯和基準真相翻譯的一些示例。 讀者可以使用谷歌翻譯驗證翻譯是否合理。
說明:
一些長句的翻譯例子。
-----
Figure 3: The left plot shows the performance of our system as a function of sentence length, where the x-axis corresponds to the test sentences sorted by their length and is marked by the actual sequence lengths. There is no degradation on sentences with less than 35 words, there is only a minor degradation on the longest sentences. The right plot shows the LSTM’s performance on sentences with progressively more rare words, where the x-axis corresponds to the test sentences sorted by their “average word frequency rank”.
圖 3:左圖顯示了我們系統作為句子長度的函數的性能,其中 x 軸對應於按長度排序的測試句子,並由實際序列長度標記。 少於35個詞的句子沒有降級,最長的句子只有輕微降級。 右圖顯示了 LSTM 在越來越少見的詞的句子上的表現,其中 x 軸對應於按“平均詞頻等級”排序的測試句子。
說明:
左圖:大部分句子越長翻譯的越好。右圖:越不常出現的字翻譯的越差。
-----
# Seq2seq 2。
說明:
與 Seq2seq 1 不同之處在於 Seq2seq 2 在每個輸出都多參考了 context 向量 c。此外 Seq2seq 2 使用簡化的 LSTM,也就是 GRU。
-----
# Seq2seq 2。
說明:
ht 參考了 ht-1、yt-1、c 的資訊。
-----
References
# RCTM。被引用 1137 次。
Kalchbrenner, Nal, and Phil Blunsom. "Recurrent continuous translation models." Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 2013.
https://www.aclweb.org/anthology/D13-1176.pdf
# Seq2seq 1。被引用 12676 次。
Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." Advances in neural information processing systems. 2014.
http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf
# Seq2seq 2。被引用 11284 次。
Cho, Kyunghyun, et al. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." arXiv preprint arXiv:1406.1078 (2014).
https://arxiv.org/pdf/1406.1078.pdf
# BLEU
Papineni, Kishore, et al. "BLEU: a method for automatic evaluation of machine translation." Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 2002.
https://www.aclweb.org/anthology/P02-1040.pdf
-----
No comments:
Post a Comment