The Star Also Rises: BERT（三）：Illustrated

BERT（三）：Illustrated

2021/09/01

-----

https://pixabay.com/zh/photos/fantasy-light-mood-sky-beautiful-2861107/

-----

# BERT

說明：

「The Token Embeddings layer will convert each wordpiece token into a 768-dimensional vector representation.」

Token Embeddings 層會將每個詞條標記轉換為 768 維向量表示。

「The Segment Embeddings layer only has 2 vector representations. The first vector (index 0) is assigned to all tokens that belong to input 1 while the last vector (index 1) is assigned to all tokens that belong to input 2.」

Segment Embeddings 層只有 2 個向量表示。第一個向量（索引 0）分配給屬於輸入 1 的所有標記，而最後一個向量（索引 1）分配給屬於輸入 2 的所有標記。

「BERT was designed to process input sequences of up to length 512. The authors incorporated the sequential nature of the input sequences by having BERT learn a vector representation for each position. This means that the Position Embeddings layer is a lookup table of size (512, 768).」

Position Embeddings 層是一個大小為 (512, 768) 的查找表。BERT 旨在處理最長為 512 的輸入序列。作者通過讓 BERT 學習每個位置的向量表示來合併輸入序列的順序性質。

https://medium.com/@_init_/why-bert-has-3-embedding-layers-and-their-implementation-details-9c261108e28a

-----

# Transformer。

-----

# BERT

說明：

「GPT 是「Generative Pre-Training」的簡稱，從名字看其含義是指的生成式的預訓練。GPT 也採用兩階段過程：第一階段是利用語言模型進行預訓練。第二階段通過 Fine-tuning 的模式解決下游任務。」

「GPT 的預訓練過程，其實和 ELMo 是類似的，主要不同在於兩點：特徵抽取器不使用雙層雙向 LSTM，而是用 Transformer，上面提到過它的特徵抽取能力要強於雙層雙向 LSTM。GPT 的預訓練雖然仍然是以語言模型作為目標任務，但是採用的是單向的語言模型，GPT 則只採用這個單詞的上文 Context-before 來進行預測。」

https://blog.csdn.net/qq_35883464/article/details/100173045

-----

# GPT

說明：

「GPT 採用兩階段過程：第一階段是利用語言模型進行預訓練。第二階段通過 Fine-tuning 的模式解決下游任務。」

四種 Fine-tuning 任務：

一、「對於分類問題，不用怎麼動，加上一個起始和終結符號即可。」

二、「對於句子關係判斷問題，比如 Entailment（關係），兩個句子中間再加個分隔符即可。」

三、「對於文本相似性判斷問題，把兩個句子順序顛倒下做出兩個輸入即可，這是為了告訴模型句子順序不重要。」

四、「對於多項選擇問題，則多路輸入，每一路把文章和答案選項拼接作為輸入即可。」

https://blog.csdn.net/qq_35883464/article/details/100173045

-----

# BERT

說明：

一、克漏字填空：(Masked Language Model, MLM）。輸出：字彙表的機率。

二、下個句子預測：第 2 個句子在原始檔案中是否跟第 1 個句子相接。（Next Sentence Prediction, NSP）。輸出：yes or no。

[CLS] classification 二元分類。[SEP] separation 句子分開。

兩種預訓練是同時做的。

https://leemeng.tw/attack_on_bert_transfer_learning_in_nlp.html

-----

# BERT

說明：

A 成對句子分類任務

B 單一句子分類任務

C 問答任務

D 單一句子標註任務

https://leemeng.tw/attack_on_bert_transfer_learning_in_nlp.html

-----

# BERT

說明：

A 成對句子分類任務

輸入：[CLS]（classification ）、句子一、[SEP]（separate ）、句子二。

輸出：類別標籤。

「[CLS] is the special symbol for classification output, and [SEP] is the special symbol to separate non-consecutive token sequences.」# BERT

[CLS] 是分類輸出的特殊符號，[SEP] 是分隔不連續的 token 序列的特殊符號。

https://hackmd.io/@shaoeChen/Bky0Cnx7L

MNLI（Multi-genre Natural Language Inference）

「Multi-Genre Natural Language Inference is a large-scale, crowdsourced entailment classification task (Williams et al., 2018). Given a pair of sentences, the goal is to predict whether the second sentence is an entailment, contradiction, or neutral with respect to the first one.」# BERT

多類型自然語言推理是一項大規模、眾包的蘊涵分類任務（Williams 等，2018）。給定一對句子，目標是預測第二個句子相對於第一個句子是蘊涵、矛盾還是中性。

「蘊涵（英語：Entailment）在命題邏輯和謂詞邏輯中用來描述在兩個句子或句子的集合之間的聯繫，一般使用 ⇒ 符號表示。」

https://zh.wikipedia.org/zh-tw/%E8%95%B4%E6%B6%B5

這是一個三分類任務。

QQP

QNLI

STS-B

MRPC

RTE

SWAG

https://zhuanlan.zhihu.com/p/102208639

-----

# BERT

說明：

B 單一句子分類任務

輸入：[CLS]（classification ）、單一句子。

輸出：類別標籤。

https://hackmd.io/@shaoeChen/Bky0Cnx7L

SST-2

「 SST-2 The Stanford Sentiment Treebank is a binary single-sentence classification task consisting of sentences extracted from movie reviews with human annotations of their sentiment (Socher et al., 2013).」# BERT

SST-2 The Stanford Sentiment Treebank 是一個二元單句分類任務，由從電影評論中提取的句子和對其情緒的人工註釋組成。

CoLA（Corpus of Linguistic Acceptability）

「CoLA The Corpus of Linguistic Acceptability is a binary single-sentence classification task, where the goal is to predict whether an English sentence is linguistically “acceptable” or not (Warstadt et al., 2018).」# BERT

CoLA 語言可接受性語料庫是一個二元單句分類任務，其目標是預測一個英語句子在語言上是否“可接受”。

https://zhuanlan.zhihu.com/p/102208639

-----

# BERT

說明：

C 問答任務

1. 輸入：[CLS]（classification ）、問題句、[SEP]（separate ）、答案來源的文件。

2. 每個單詞通過 BERT 後，會產生一個詞嵌入向量。

3. 讓模型學習 v_s 跟 v_e 兩個向量。維度與輸出的詞嵌入向量相同。

4. v_s 與文件的每一個輸出詞嵌入做點積計算（dot product）得到純量，再做 softmax。

5. v_e 與文件的每一個輸出詞嵌入做點積計算（dot product）得到純量，再做 softmax。

6. s = 2，e = 3，答案是 "d2d3"。

7. 若 s = 3，e = 2，則回答此題無解。

https://hackmd.io/@shaoeChen/Bky0Cnx7L

SQuAD

https://zhuanlan.zhihu.com/p/102208639

-----

# BERT

說明：

D 單一句子標註任務

輸入：[CLS]（classification ）、單一句子。

輸出：每個單詞的位置種類與類別種類，參考 NER。

https://hackmd.io/@shaoeChen/Bky0Cnx7L

https://zhuanlan.zhihu.com/p/102208639

NER（Named Entity Recognition）

「句子「小明在北京大學的燕園看了中國男籃的一場比賽」，通過 NER 模型，將「小明」以 PER，「北京大學」以 ORG，「燕園」以 LOC，「中國男籃」以 ORG 為類別分別挑了出來。」

「B，即 Begin，表示開始。I，即 Intermediate，表示中間。E，即 End，表示結尾。S，即 Single，表示單個字符。O，即 Other，表示其他，用於標記無關字符。」

「將「小明在北京大學的燕園看了中國男籃的一場比賽」這句話，進行標註，結果就是：[B-PER，E-PER，O, B-ORG，I-ORG，I-ORG，E-ORG，O，B-LOC，E-LOC，O，O，B-ORG，I-ORG，I-ORG，E-ORG，O，O，O，O]。」

https://zhuanlan.zhihu.com/p/88544122

-----

# BERT

-----

References

# BERT。被引用 12556 次。

Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).

https://arxiv.org/pdf/1810.04805.pdf

# BERT Pipeline

Tenney, Ian, Dipanjan Das, and Ellie Pavlick. "BERT rediscovers the classical NLP pipeline." arXiv preprint arXiv:1905.05950 (2019).

https://arxiv.org/pdf/1905.05950.pdf

# Transformer。被引用 13554 次。

Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems. 2017.

https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

# GPT

Radford, Alec, et al. "Improving language understanding by generative pre-training." URL https://s3-us-west-2. amazonaws. com/openai-assets/research-covers/languageunsupervised/language understanding paper. pdf (2018).

https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf

-----

The Star Also Rises

Monday, December 13, 2021

BERT（三）：Illustrated

No comments:

Programmer

Blog Archive

Labels

Recent Comments

My Blog List

MY LINKS

status

About Me