The Star Also Rises: NIN（三）：Illustrated

NIN（三）：Illustrated

2021/03/07

-----

https://pixabay.com/zh/photos/drop-splash-drip-water-liquid-wet-3698073/

-----

Figure 1: Comparison of linear convolution layer and mlpconv layer. The linear convolution layer includes a linear filter while the mlpconv layer includes a micro network (we choose the multilayer perceptron in this paper). Both layers map the local receptive field to a confidence value of the latent concept.

圖1：線性卷積層和 mlpconv 層的比較。線性卷積層包括一個線性濾波器，而mlpconv層包括一個微網路（我們在本文中選擇多層感知器）。兩層都將局部感受野映射到「潛在概念的置信度值」。

# NIN

說明：

這只能算是一個示意圖。左邊是一般卷積。右邊是一般卷積再通過 Conv1，也就是 1 x 1 convolution。Conv1 可以將多張特徵圖升維或降維。

-----

Figure 2: The overall structure of Network In Network. In this paper the NINs include the stacking of three mlpconv layers and one global average pooling layer.

圖2：“網路中的網路”的整體結構。在本論文中，NIN 包括三個 mlpconv 層和一個全局平均池化層的堆疊。

# NIN

說明：

Conv1 後來通常用來用於升維與降維。但原論文中，只是用來重新調整特徵圖像素的值。可以與 AlexNet 的 LRN 比較。由學習權重而調整像素值理論上比專家的設定公式更好。

-----

3.3 Local Response Normalization

# AlexNet

說明：

是一個過時的專家方法。在 GoogLeNet 裡面認為有用，在 VGGNet 裡面認為沒用。實際上 Conv1 更一般化且結果更好。

-----

Table 1: Test set error rates for CIFAR-10 of various methods.

# NIN

說明：

NIN 加上 Dropout 與資料擴增兩種正則化方法（避免過擬合）明顯優於當時其他演算法在 CIFAR10 上的表現。

-----

Figure 3: The regularization effect of dropout in between mlpconv layers. Training and testing error of NIN with and without dropout in the first 200 epochs of training is shown.

圖3：mlpconv 層之間，dropout 的正則化效果。顯示了在訓練的前 200 個時期中有無 dropout 的 NIN 的訓練和測試錯誤。

# NIN

說明：

Dropout 是一個正則化工具。Conv1「應該」也是一個正則化的工具。論文提到全局平均池化也是一個正則化工具。總之，「平均」避免了過度針對個別像素而建立模型，也就是避免了過擬合。

-----

Table 2: Test set error rates for CIFAR-100 of various methods.

說明：

NIN 搭配 Dropout 是基本組合。

-----

Table 3: Test set error rates for SVHN of various methods.

# NIN

說明：

SVHN 的場景下，DropConnect 優於 Dropout + NIN。

-----

Table 4: Test set error rates for MNIST of various methods.

# NIN

說明：

在經典的 MNIST 上，大部分都調到很低的錯誤率了。

-----

Table 5: Global average pooling compared to fully connected layer.

# NIN

說明：

GAP 看起來比 Dropout 的正則化效果更好。

-----

說明：

GAP 雖然可以避免過擬合，但是不利遷移學習，因為影像的特徵要重新學習。在原先有全連接層的狀態下，影像特徵可以大部分保留，主要重新訓練全連接層即可。

https://zhuanlan.zhihu.com/p/46235425

-----

◎ 正則化：減少參數量、避免過擬合。

以下是幾種正則化：

L2：增加一個平方項（單位圓）的懲罰項，讓結果不要太好（過擬合）。

Weight Decay：參數的值在訓練時以一定的比例衰減，部分參數就會趨近於 0。（L2 在 SGD 等同 WD。）

Dropout：（全連接層的參數太多）以 p 的比例每次訓練隨機丟棄神經元。推論時權重再乘以 p。（Dropout 是一種 Ensemble Learning，也就是多個子模型的平均。）

全局平均池化：直接放棄全連接層，參數因而大量減少。

-----

Figure 4: Visualization of the feature maps from the last mlpconv layer. Only top 10% activations in the feature maps are shown. The categories corresponding to the feature maps are: 1. airplane, 2. automobile, 3. bird, 4. cat, 5. deer, 6. dog, 7. frog, 8. horse, 9. ship, 10. truck. Feature maps corresponding to the ground truth of the input images are highlighted. The left panel and right panel are just different examplars.

圖4：可視化來自最後一個 mlpconv 層的特徵圖。僅顯示功能圖中激活率最高的 10％。與特徵圖相對應的類別為：1。飛機，2。汽車，3。鳥，4。貓，5。鹿，6。狗，7。青蛙，8。馬，9。船，10。卡車。與輸入圖像的真值相對應的特徵圖被凸顯。左面板和右面板只是不同的示例。

說明：

可以特別注意一下汽車。

-----

Figure 1: A Squeeze-and-Excitation block.

# SENet

說明：

SENet 模組透過訓練，調整通道間每張特徵圖像素值的 scale。

-----

Figure 2: The schema of the original Inception module (left) and the SE-Inception module (right).

圖2：原始 Inception 模塊（左）和 SE-Inception 模塊（右）的架構。

# SENet

說明：

先經過全局平均池化將每張特徵圖縮成一個點。接下來將通道數 C 個點透過全連接層轉成 C / r 個點。ReLU。透過全連接層再轉回 C 個點。Sigmoid，得到每張特徵圖的 scale 係數。係數乘以原特徵圖。內層選 ReLU，因為 ReLU 就很好用了。外層選 sigmoid，因為要讓 scale 的值在 0 跟 1 之間。

-----

Figure 3: The schema of the original Residual module (left) and the SE-ResNet module (right).

圖3：原始殘差模塊（左）和 SE-ResNet 模塊（右）的架構。

# SENet

說明：

參考圖2。

-----

Figure 1. Selective Kernel Convolution.

# SKNet

說明：

一開始是 Conv3（dilation size = 2）跟 Conv5，黃綠兩個 U，點對點相加。到 z 為止，跟 SENet 類似。分別對 a 跟 b 兩個向量做 softmax 後，得到 ac 跟 bc 兩個係數，c 代表第 c 張特徵圖。由於 softmax 的特性，ac + bc = 1。ac 跟 bc 決定 Conv3 跟 Conv5 兩種特徵圖的比例。輸出為 V。

-----

# SKNet

說明：

d = max(C/r, L)。L = 32 是論文設定。d 是壓縮後的一維矩陣的長度。

z = Ffc(s) = δ(B(Ws))。B 是 Batch Normalization。δ 是 ReLU。

精神上還是跟 SENet 一樣，只是由一個向量變成兩個向量。兩個向量會再經過 softmax。

-----

# SKNet

說明：

C：channel 數。

c：channel index。

-----

# SKNet

說明：

新的 V 每張特徵圖由 Conv3 跟 Conv5 組成，比例由先前的 softmax 形成。

-----

References

# LRN vs Conv1

# AlexNet

Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems 25 (2012): 1097-1105.

https://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf

# channel domain

# NIN。

Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." arXiv preprint arXiv:1312.4400 (2013).

https://arxiv.org/pdf/1312.4400.pdf

# # SENet

Hu, Jie, Li Shen, and Gang Sun. "Squeeze-and-excitation networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.

http://openaccess.thecvf.com/content_cvpr_2018/papers/Hu_Squeeze-and-Excitation_Networks_CVPR_2018_paper.pdf

# SKNet

Li, Xiang, et al. "Selective kernel networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2019.

https://openaccess.thecvf.com/content_CVPR_2019/papers/Li_Selective_Kernel_Networks_CVPR_2019_paper.pdf

-----

The Star Also Rises

Sunday, April 11, 2021

NIN（三）：Illustrated

No comments:

Programmer

Blog Archive

Labels

Recent Comments

My Blog List

MY LINKS

status

About Me