The Star Also Rises: GoogLeNet（四）：Appendix

GoogLeNet（四）：Appendix

2021/04/12

-----

Figure 6. Inception modules after the factorization of the n × n convolutions. In our proposed architecture, we chose n = 7 for the 17 × 17 grid. (The filter sizes are picked using principle 3)

圖6. n×n 卷積分解後的 Inception 模組。在我們提出的架構中，我們為 17×17 網格選擇 n = 7。（使用原理 3 選擇過濾器尺寸）

# Inception v3

說明：

原理三：

空間聚合可以在較低維度嵌入上完成，而不會在表示能力上造成許多或任何損失。例如，在執行更多展開（例如3×3）卷積之前，可以在空間聚合之前減小輸入表示的維度，沒有預期的嚴重不利影響。我們假設，如果在空間聚合上下文中使用輸出，則相鄰單元之間的強相關性會導致維度縮減期間的信息損失少得多。鑑於這些信號應該易於壓縮，因此尺寸減小甚至會促進更快的學習。

-----

Figure 7. Inception modules with expanded the filter bank outputs. This architecture is used on the coarsest (8 × 8) grids to promote high dimensional representations, as suggested by principle 2 of Section 2. We are using this solution only on the coarsest grid, since that is the place where producing high dimensional sparse representation is the most critical as the ratio of local processing (by 1 × 1 convolutions) is increased compared to the spatial aggregation.

圖7. 擴展了濾波器組輸出的 Inception 模組。正如第 2 節的原理 2 所建議的，此架構用於最粗糙的（8×8）網格以促進高維表示。我們僅在最粗糙的網格上使用此解決方案，因為這是生成高維稀疏表示的地方。與空間聚合相比，“局部處理”比例（按 1×1 卷積）增加是最關鍵的。

# Inception v3

說明：

原理二：

更高維度的表示在網路中更容易局部處理。在卷積網路中增加每個圖塊的激活允許更多解壓縮的特徵。

-----

Figure 10. Inception module that reduces the grid-size while expands the filter banks.

It is both cheap and avoids the representational bottleneck as is suggested by principle 1.

The diagram on the right represents the same solution but from the perspective of grid sizes rather than the operations.

圖10. Inception 模組，可減小網格大小，同時擴展濾波器組。它既便宜又避免了原則 1 所提出的表現的瓶頸。右圖表示相同的解決方案，但是是從網格大小而不是操作的角度來看的。

# Inception v3

說明：

原理一：

前饋網路可以由從輸入層到分類器或回歸器的非循環圖表示。這是信息流定義了一個清晰的方向。可以訪問通過尖端的信息量。應該避免極端壓縮的誤差。一般來說，在達到用於著手任務的最終表示之前，表示大小應該從輸入到輸出逐漸減小。理論上，信息內容不能僅通過表示的尺寸來評估，因為它替換了某種相關結構的重要因素；尺寸僅提供信息內容的粗略估計。

-----

Table 1. The outline of the proposed network architecture. The output size of each module is the input size of the next one. We are using variations of reduction technique depicted Figure 10 to reduce the grid sizes between the Inception blocks whenever applicable. We have marked the convolution with 0-padding, which is used to maintain the grid size. 0-padding is also used inside those Inception modules that do not reduce the grid size. All other layers do not use padding. The various filter bank sizes are chosen to observe principle 4 from Section 2.

表1. 提議的網路架構概述。每個模組的輸出大小是下一個模組的輸入大小。只要適用，我們將使用圖 10 所示的簡化技術來減小 Inception 塊之間的網格大小。我們用 0-padding 標記了卷積，用於維持網格大小。在那些不會減小網格大小的 Inception 模組中也使用了 0-padding。所有其他層都不使用填充。選擇各種濾波器組尺寸以遵守第 2 節中的原則 4。

# Inception v3

說明：

原理四：

平衡網路的寬度和深度。通過平衡每個階段的濾波器數量和網路的深度可以達到網絡的最佳性能。增加網路的寬度和深度可以有助於更高質量的網路。然而，如果兩者並行增加，則可以達到恆定計算量的最佳改進。因此，計算預算應該在網路的深度和寬度之間以平衡方式進行分配。

-----

Consider a distribution over labels u(k), independent of the training example x, and a smoothing parameter ε. For a training example with ground-truth label y, we replace the label distribution q(k|x) = δk,y with

考慮獨立於訓練樣本 x 和平滑參數 ε 的標籤 u（k）上的分佈。對於帶有真實標籤 y 的訓練樣本，我們將標籤分佈 q（k | x）=δk，y 替換為

which is a mixture of the original ground-truth distribution q(k|x) and the fixed distribution u(k), with weights 1 − ε and ε , respectively. This can be seen as the distribution of the label k obtained as follows: first, set it to the groundtruth label k = y; then, with probability ε , replace k with a sample drawn from the distribution u(k). We propose to use the prior distribution over labels as u(k). In our experiments, we used the uniform distribution u(k) = 1/K, so that

它是原始基準真相分佈 q（k | x）和固定分佈 u（k）的混合，權重分別為 1-ε 和 ε。這可以看作是獲得的標籤 k 的分佈，如下所示：首先，將其設置為地面標籤 k = y；然後，用概率 ε 將 k 替換為從分佈 u（k）得出的樣本。我們建議使用標籤上的先驗分佈為u（k）。在我們的實驗中，我們使用均勻分佈 u（k）= 1 / K，因此

# Inception v3

說明：

-----

likelihood and maximum likelihood estimators：未知的機率。用實驗得來的樣本，去推估事件在什麼參數（假設二項式分布）下，此事件發生的機率為最大。似然（likelihood）方程，是一個關於未知參數 π 的函數。我們要做的，是找到這個函數的最大值，以及使這個函數成為最大值的 π 。

log likelihood：似然方程與對數似然方程的最大值位置相同，但 log 比較好微。

https://bookdown.org/ccwang/medical_statistics6/likelihood-definition.html

-----

x：某個樣本。

K：標籤的總數。

k：第 k 個標籤。

p(k|x)：x 是第 k 個標籤的機率（以 softmax 格式）。

q(k|x)：p(k|x) 的 ground-truth。

u(k)

-----

Another interpretation of LSR can be obtained by considering the cross entropy:

可以通過考慮交叉熵來獲得 LSR 的另一種解釋：

Thus, LSR is equivalent to replacing a single cross-entropy loss H(q, p) with a pair of such losses H(q, p) and H(u, p). The second loss penalizes the deviation of predicted label distribution p from the prior u, with the relative weight ε / 1−ε.

因此，LSR 等效於用一對這樣的損耗 H（q，p）和 H（u，p）代替單個交叉熵損耗 H（q，p）。第二個損失懲罰了相對於先前 u 的預測標籤分佈 p 與相對權重 ε/1-ε 的偏差。

# Inception v3

說明：

-----

Figure 1: Visualization of penultimate layer’s activations of: AlexNet/CIFAR-10 (first row), CIFAR-100/ResNet-56 (second row) and ImageNet/Inception-v4 with three semantically different classes (third row) and two semantically similar classes plus a third one (fourth row).

圖1：倒數第二層激活的可視化：AlexNet / CIFAR-10（第一列），CIFAR-100 / ResNet-56（第二列）和ImageNet / Inception-v4，它們具有三個語義上不同的類（第三列）和兩個語義上類似的類別，再加上第三個類（第四列）。

# Label Smoothing

說明：

-----

# Confidence penalty

說明：

Confidence penalty。

-----

Figure 1: Distribution of the magnitude of softmax probabilities on the MNIST validation set. A fully-connected, 2-layer, 1024-unit neural network was trained with dropout (left), label smoothing (center), and the confidence penalty (right). Dropout leads to a softmax distribution where probabilities are either 0 or 1. By contrast, both label smoothing and the confidence penalty lead to smoother output distributions, which results in better generalization.

圖1. MNIST 驗證集上 softmax 機率的大小分佈。訓練了一個全連接的 2 層，1024 個單元的神經網路，其中包括 dropout（左），標籤平滑（中）和置信度懲罰（右）。Dropout 會導致 softmax 分佈，其中概率為 0 或 1。相比之下，標籤平滑和置信度損失均會導致輸出分佈更平滑，從而導致更好的泛化。

# Confidence Penalty

說明：

Confidence Penalty。

-----

Figure 3: The schema for interior grid modules of the pure Inception-v4 network. The 35×35, 17×17 and 8×8 grid modules are depicted from left to right. These are the Inception-A, Inception-B, and Inception-C blocks of Figure 2 respectfully.

圖3：純 Inception-v4 網絡的內部網格模組的架構。從左到右描繪了35×35、17×17 和 8×8 網格模組。它們分別是圖2 的Inception-A，Inception-B 和 Inception-C 塊。

# Inception v4

說明：Inception v4 包含了 A、B、C 三種模組。A 就是 Inception v2，B 就是 Inception v4 獨有的，C 就是 Inception v3。35x35、17x17、8x8 都是特徵圖的大小。

-----

Figure 7. The schema for 35 x 35 to 17 x 17 reduction module.

Different variants of this blocks (with various number of filters) are used in Figure 9, and 15 in each of the new Inception(-v4, -ResNet-v1, -ResNet-v2) variants presented in this paper.

The k, l, m, n numbers represent filter bank sizes which can be looked up in Table 1.

圖7. 35 x 35 至 17 x 17 縮小模組的圖式。在圖9 中使用了此塊的不同變體（具有各種數量的過濾器），在本文中介紹的每個新 Inception（-v4，-ResNet-v1，-ResNet-v2）變體中均使用了 15。k，l，m，n 的數字表示可以在表1 中查找的濾波器組大小。

https://arxiv.org/pdf/1602.07261.pdf

說明：（這是 # Inception v4 論文的較早版本）

Reduction A：從 35x35 降到 17x17。

-----

Figure 8. The schema for 17 x 17 to 8 x 8 grid-reduction module. This is the reduction module used by the pure Inception-v4 network in Figure 9.

圖8. 17 x 17 至 8 x 8 網格縮減模塊的架構。這是圖9 中的純Inception-v4 網絡使用的簡化模組。

https://arxiv.org/pdf/1602.07261.pdf

說明：（這是 # Inception v4 論文的較早版本）

Reduction B：從 17x17 降到 8x8。

-----

Figure 2: On the left is the overall schema for the pure Inception-v4 network. On the right is the detailed composition of the stem. Note that this stem configuration was also used for the Inception-ResNet-v2 network outlines in Figures 5, 6. V denotes the use of ‘Valid’ padding, otherwise ‘Same’ padding was used. Sizes to the side of each layer summarize the shape of the output for that layer.

圖2：左側是純 Inception-v4 完整的圖式。右側是主幹的組成細節。注意，主幹的配置也被圖5、6 的 Inception-ResNet-v2 使用。V 表示「有效」的填充，否則使用「相同」的填充。每層邊緣的大小總結了該層輸出的形狀。

# Inception v4

說明：

純的 Inception-v4。

-----

Figure 4: The schema for interior grid modules of the Inception-ResNet-v1 network. The 35 × 35, 17 × 17 and 8 × 8 grid modules are depicted from left to right. These are the Inception-A, Inception-B, and Inception-C blocks of the schema on the left of Figure 6 for the Inception-ResNet-v1 network, respectfully.

圖4：Inception-ResNet-v1 網路的內部網格模組的圖式。從左至右描繪了 35×35、17×17 和 8×8 網格模組。這些分別是在圖6 左側，Inception-ResNet-v1 網路的，Inception-A，Inception-B 和 Inception-C 塊。

# Inception v4

說明：

Inception-ResNet-v1 的模組。

-----

Figure 5: The schema for interior grid modules of the Inception-ResNet-v2 network. The 35 × 35, 17 × 17 and 8 × 8 grid modules are depicted from left to right. These are the Inception-A, Inception-B and Inception-C blocks of the schema on the left of Figure 6 for the Inception-ResNet-v2 network, respectfully.

圖5：Inception-ResNet-v2 網路的內部網格模組的圖式。從左至右描繪了 35×35、17×17 和 8×8 網格模組。這些分別是在圖6 左側，Inception-ResNet-v2 網路的，Inception-A，Inception-B 和 Inception-C 塊。

# Inception v4

說明：

Inception-ResNet-v2 的模組。

-----

Figure 6: On the left is the overall schema for the Inception-Resnet-v1 and Inception-Resnet-v2 network. While the schema are the same for both networks, the composition of the stem and interior modules differ. The stem of Inception-Resnet-v1 is shown to the right, while the stem of Inception-Resnet-v2 is the same as the pure Inception-v4 network, depicted on the right of Figure 2. The interior modules are denoted in Figure 4 and Figure 5 respectfully. V denotes the use of ‘Valid’ padding, otherwise ‘Same’ padding was used. Sizes to the side of each layer summarize the shape of the output for that layer.

圖6：左側是 Inception-Resnet-v1 和I nception-Resnet-v2 網路的整體圖式。雖然兩個網路的圖式都相同，但是主幹和內部模組的組成卻有所不同。Inception-Resnet-v1 的主幹顯示在右側，而 Inception-Resnet-v2 的主幹與純 Inception-v4 網路相同，如圖2 右側所示。內部模組分別在圖4 和圖5 中表示。V 表示使用「有效」的填充，否則使用「相同」的填充。每層側面的尺寸匯總了該層輸出的形狀。

# Inception v4

說明：

Inception-ResNet v1 與 v2 的共同框架。

-----

References

# Inception v3 # Label Smoothing

Szegedy, Christian, et al. "Rethinking the inception architecture for computer vision." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.

https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Szegedy_Rethinking_the_Inception_CVPR_2016_paper.pdf

# Label Smoothing

Müller, Rafael, Simon Kornblith, and Geoffrey Hinton. "When does label smoothing help?." arXiv preprint arXiv:1906.02629 (2019).

https://arxiv.org/pdf/1906.02629.pdf

# Inception v4 # Inception-ResNet

Szegedy, Christian, et al. "Inception-v4, inception-resnet and the impact of residual connections on learning." AAAI. Vol. 4. 2017.

https://arxiv.org/pdf/1602.07261.pdf

https://ojs.aaai.org/index.php/AAAI/article/download/11231/11090

# Confidence Penalty

Pereyra, Gabriel, et al. "Regularizing neural networks by penalizing confident output distributions." arXiv preprint arXiv:1701.06548 (2017).

https://arxiv.org/pdf/1701.06548.pdf

-----

以下只列出論文

-----

# SqueezeNet

Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size." arXiv preprint arXiv:1602.07360 (2016).

https://arxiv.org/pdf/1602.07360.pdf

# MobileNet v1

Howard, Andrew G., et al. "Mobilenets: Efficient convolutional neural networks for mobile vision applications." arXiv preprint arXiv:1704.04861 (2017).

https://arxiv.org/pdf/1704.04861.pdf

# MobileNet v2

Sandler, Mark, et al. "Mobilenetv2: Inverted residuals and linear bottlenecks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.

http://openaccess.thecvf.com/content_cvpr_2018/papers/Sandler_MobileNetV2_Inverted_Residuals_CVPR_2018_paper.pdf

# MobileNet v3

Howard, Andrew, et al. "Searching for mobilenetv3." arXiv preprint arXiv:1905.02244 (2019).

https://arxiv.org/pdf/1905.02244.pdf

# ShuffleNet v1

Zhang, Xiangyu, et al. "Shufflenet: An extremely efficient convolutional neural network for mobile devices." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.

http://openaccess.thecvf.com/content_cvpr_2018/papers/Zhang_ShuffleNet_An_Extremely_CVPR_2018_paper.pdf

# ShuffleNet v2

Ma, Ningning, et al. "Shufflenet v2: Practical guidelines for efficient cnn architecture design." Proceedings of the European Conference on Computer Vision (ECCV). 2018.

http://openaccess.thecvf.com/content_ECCV_2018/papers/Ningning_Light-weight_CNN_Architecture_ECCV_2018_paper.pdf

# ESPNet v1

Mehta, Sachin, et al. "Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation." Proceedings of the European Conference on Computer Vision (ECCV). 2018.

http://openaccess.thecvf.com/content_ECCV_2018/papers/Sachin_Mehta_ESPNet_Efficient_Spatial_ECCV_2018_paper.pdf

# ESPNet v2

Mehta, Sachin, et al. "Espnetv2: A light-weight, power efficient, and general purpose convolutional neural network." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.

http://openaccess.thecvf.com/content_CVPR_2019/papers/Mehta_ESPNetv2_A_Light-Weight_Power_Efficient_and_General_Purpose_Convolutional_Neural_CVPR_2019_paper.pdf

-----

The Star Also Rises

Sunday, April 25, 2021

GoogLeNet（四）：Appendix

No comments:

Programmer

Blog Archive

Labels

Recent Comments

My Blog List

MY LINKS

status

About Me