The Star Also Rises: DenseNet（三）：Illustrated

DenseNet（三）：Illustrated

2021/03/27

-----

https://pixabay.com/zh/photos/city-architecture-building-urban-5051196/

-----

DenseNet 的第一個重點是圖4，比較 DenseNet、DenseNet-C、DenseNet-B，以及 DenseNet-BC 的異同。特別是 DenseNet-B 與 DenseNet 的差異。

DenseNet 的第二個重點是圖5。可參考 https://www.tensorinfinity.com/paper_89.html。

DenseNet 的第三個重點是 "Visualizing the loss landscape of neural nets." 的圖7。

Li, Hao, et al. "Visualizing the loss landscape of neural nets." Advances in Neural Information Processing Systems. 2018.

https://papers.nips.cc/paper/2018/file/a41b3bb3e6b050b6c9067c67f663b915-Paper.pdf

-----

# DenseNet

說明：

growth rate

k 為 growth rate，也就是每一層的輸出。每一層的輸出，也會當成後續所有層的輸入。一般的作法是每張特徵圖通過一個卷積核，變成一張新的特徵圖。本論文重點是 k0 張如何變成 k 張。可以參考 LeNet。

-----

# LeNet。

六張特徵圖到十六張特徵圖。

說明：

以 16 張的第 0 張為例，它以六張的前三張共用一個卷積核。

combined

How are the feature maps of all filters in a convolutional layer combined? What is the final output of the layer?

「The feature maps from one layer are used to create new feature maps in the next layer. Each feature map in this second layer is a combination of all the feature maps in the first layer. And the value of the feature map in the second layer, at any one pixel, is found by multiplying each feature in the first layer with a convolution kernel, with a different kernel for each feature map in the first layer. The responses are then summed, added to a bias term, and then modified by a simple non-linear operation.」

卷積層中所有濾波器的特徵圖如何組合？該層的最終輸出是什麼？

一層中的特徵圖用於在下一層中創建新的特徵圖。第二層中的每個特徵圖都是第一層中所有特徵圖的組合。通過將第一層中的每個特徵乘以卷積核，並為第一層中的每個特徵圖使用不同的核，可以找到第二層中任意一個像素的特徵圖的值。然後將響應求和，添加到偏差項，然後通過簡單的非線性運算進行修改。

https://www.quora.com/How-are-the-feature-maps-of-all-filters-in-a-convolutional-layer-combined-What-is-the-final-output-of-the-layer

-----

# Convolution Guide

說明：

卷積可以視為稀疏的全連接層。

-----

Figure 1: A 5-layer dense block with a growth rate of k = 4. Each layer takes all preceding feature-maps as input.

圖1：一個 5 層密集塊，增長率為 k = 4。每一層都將所有先前的特徵圖作為輸入。

# DenseNet

說明：

假定第一層的輸入是 k0 張特徵圖，第一層的輸出是 k 張特徵圖。後面每一層的輸出都是 k 張特徵圖。每一層的輸出都會成為之後每一層的輸入。

那麼，這 k 張是怎麼決定的？標準方法不是用 Conv1。DenseNet-B 用 Conv1。

-----

Figure 2: A deep DenseNet with three dense blocks. The layers between two adjacent blocks are referred to as transition layers and change feature-map sizes via convolution and pooling.

圖2：具有三個密集塊的深 DenseNet。兩個相鄰塊之間的層稱為過渡層，並通過卷積和池化更改特徵圖大小。

# DenseNet

說明：

過渡層，並通過 1x1 卷積降維，用池化更改特徵圖大小。

-----

Table 1: DenseNet architectures for ImageNet. The growth rate for the first 3 networks is k = 32, and k = 48 for DenseNet-161. Note that each “conv” layer shown in the table corresponds the sequence BN-ReLU-Conv.

表1：用於 ImageNet 的 DenseNet 架構。前三個網路的增長率是 k = 32，而對於 DenseNet-161，k = 48。注意，表中顯示的每個 “ conv” 層對應於序列 BN-ReLU-Conv。

# DenseNet

說明：

每個 “ conv” 層對應於序列 BN-ReLU-Conv。這張圖是 DenseNet-B。

-----

# DenseNet

說明：

每層增加 k 張會讓總數越來越多。瓶頸層用 Conv1 強制讓輸入變成 4k 張。

-----

# DenseNet

說明：

過渡層可以用 Conv1 壓縮特徵圖張數。本論文設定壓縮成一半的張數。

-----

Table 2: Error rates (%) on CIFAR and SVHN datasets. k denotes network’s growth rate. Results that surpass all competing methods are bold and the overall best results are blue. “+” indicates standard data augmentation (translation and/or mirroring). indicates results run by ourselves. All the results of DenseNets without data augmentation (C10, C100, SVHN) are obtained using Dropout. DenseNets achieve lower error rates while using fewer parameters than ResNet. Without data augmentation, DenseNet performs better by a large margin.

表2：CIFAR 和S VHN 資料集的錯誤率（％）。 k 表示網路的增長率。超過所有競爭方法的結果都是粗體，總體最佳結果是藍色。 “ +”表示標準資料擴充（轉換和/或鏡像）。表示結果由我們自己決定。使用 Dropout 可獲得 DenseNets 的所有不進行資料擴充的結果（C10，C100，SVHN）。與 ResNet 相比，DenseNets 使用更少的參數可實現更低的錯誤率。如果沒有資料擴充，DenseNet的性能將大大提高。

# DenseNet

說明：

粗體表示比所有競爭者好。

藍色表示是所有的裡面最好的。

+ 表示資料擴充（左上左下右上右下中、水平翻轉）。

-----

Table 3: The top-1 and top-5 error rates on the ImageNet validation set, with single-crop (10-crop) testing.

表3：使用單幅（10幅）測試的 ImageNet 驗證集上的 top-1 和 top-5 錯誤率。

# DenseNet

說明：

深度很重要，但寬度（k）似乎更重要。

-----

Figure 3: Comparison of the DenseNets and ResNets top-1 error rates (single-crop testing) on the ImageNet validation dataset as a function of learned parameters (left) and FLOPs during test-time (right).

圖3：在 ImageNet 驗證資料集上 DenseNets 和 ResNets top-1錯誤率（單幅測試）的比較，作為測試期間學習的參數（左）和 FLOP 的函數（右）。

# DenseNet

說明：

參數與浮點數運算都優於 ResNet。

-----

Figure 4: Left: Comparison of the parameter efficiency on C10+ between DenseNet variations. Middle: Comparison of the parameter efficiency between DenseNet-BC and (pre-activation) ResNets. DenseNet-BC requires about 1/3 of the parameters as ResNet to achieve comparable accuracy. Right: Training and testing curves of the 1001-layer pre-activation ResNet [12] with more than 10M parameters and a 100-layer DenseNet with only 0.8M parameters.

圖4：左圖：DenseNet 變體之間 C10 + 上參數效率的比較。中：對比 DenseNet-BC 和（激活前）ResNets 的參數效率。 DenseNet-BC 需要大約 1/3 的參數作為 ResNet 才能達到可比的精度。右圖：參數超過 10M 的 1001 層預激活 ResNet [12] 和參數僅為 0.8M 的 100 層D enseNet的訓練和測試曲線。

# DenseNet

說明：

左。DenseNet-BC 最優。

中。效能一樣的 DenseNet-BC，參數是 ResNet 的三分之一。

右。參數量較少的 DenseNet-BC，泛化能力比 ResNet 好。原因有可能是 DenseNet 是更稠密的 ensemble？

-----

Figure 5: The average absolute filter weights of convolutional layers in a trained DenseNet. The color of pixel (s, ℓ) encodes the average L1 norm (normalized by number of input feature-maps) of the weights connecting convolutional layer s to ℓ within a dense block. Three columns highlighted by black rectangles correspond to two transition layers and the classification layer. The first row encodes weights connected to the input layer of the dense block.

圖5：經過訓練的 DenseNet 中卷積層的平均絕對濾波器權重。像素的顏色（s，ℓ）編碼在密集塊內將捲積層 s 連接到 ℓ 的權重的平均 L1 範數（通過輸入特徵圖的數量歸一化）。用黑色矩形突出顯示的三列對應於兩個過渡層和分類層。第一行對連接到密集塊輸入層的權重進行編碼。

# DenseNet

說明：

紅色表示 strong use，藍色表示 almost no use。橫坐標是選定層，縱坐標是選定層之前一層。最右方與最上方是 transition layer。

從圖中可以得到以下結論：

a) 較早的層提取出的特徵部分仍可能被較深的層使用。

b) 即便是 Transition layer 也有可能使用到之前 Denseblock 中所有的層的特徵。

c) 第 2 與第 3 個 Denseblock 中的層對之前的 Transition layer 利用率非常低，這表示 transition layer 會輸出大量冗餘的特徵。這也為 DenseNet-BC 提供證據支持，也就是 Compression 之必要。

d) 最後一層的分類層，雖然使用了之前 Denseblock 中的多層訊息，但更偏向使用最後幾個feature maps 的特徵。這說明在網路最後幾層，某些 high-level 的特徵可能會被產生。

https://www.tensorinfinity.com/paper_89.html

-----

Figure 4: The loss surfaces of ResNet-110-noshort and DenseNet for CIFAR-10.

# ResNet-V。

說明：

ResNet-110-noshort 與 DenseNet。DenseNet 也是 ensemble？！

-----

Figure 2: A deep DenseNet with three dense blocks. The layers between two adjacent blocks are referred to as transition layers and change feature-map sizes via convolution and pooling.

圖2：具有三個密集塊的深 DenseNet。兩個相鄰塊之間的層稱為過渡層，並通過卷積和池化更改特徵圖大小。

# DenseNet

說明：

過渡層，並通過 1x1 卷積降維，用池化更改特徵圖大小。

-----

Are we really seeing convexity? We are viewing the loss surface under a dramatic dimensionality reduction, and we need to be careful interpreting these plots. For this reason, we quantify the level of convexity in loss functions but computing the principle curvatures, which are simply eigenvalues of the Hessian. A truly convex function has no negative curvatures (the Hessian is positive semi-definite), while a non-convex function has negative curvatures.

我們真的看到凸面了嗎？我們正在觀察維度急劇減小下的損失表面，我們需要仔細解釋這些圖。因此，我們可以對損失函數中的凸度進行量化，但要計算主曲率，這僅僅是 Hessian 的特徵值。真正的凸函數不具有負曲率（Hessian 為正半定），而非凸函數則具有負曲率。

說明：

Hessian 為正半定，則為凸函數（平滑）。

-----

https://zh.wikipedia.org/wiki/%E9%BB%91%E5%A1%9E%E7%9F%A9%E9%99%A3

說明：

Hessian。

-----

https://ccjou.wordpress.com/2013/01/10/%E5%8D%8A%E6%AD%A3%E5%AE%9A%E7%9F%A9%E9%99%A3%E7%9A%84%E5%88%A4%E5%88%A5%E6%96%B9%E6%B3%95/

-----

# Hessian

說明：

特徵值與特徵向量。

-----

Figure 7: For each point in the filter-normalized surface plots, we calculate the maximum and minimum eigenvalue of the Hessian, and map the ratio of these two.

圖7：對於濾波器歸一化曲面圖中的每個點，我們計算 Hessian 的最大和最小特徵值，並映射這兩個的比率。

# ResNet-V。

說明：

對於濾波器歸一化曲面圖中的每個點，我們計算 Hessian 的最大和最小特徵值，並映射這兩個的比率。

Hessian 為半正定的話，最小特徵值為 0。畫面是深藍色。表示是 convex。若否，則偏黃。

-----

Figure 2: Architecture comparison of different networks. (a) The residual network. (b) The densely connected network, where each layer can access the outputs of all previous micro-blocks. Here, a 1 x 1 convolutional layer (underlined) is added for consistency with the micro-block design in (a). (c) By sharing the first 1 x 1 connection of the same output across micro-blocks in (b), the densely connected network degenerates to a residual network. The dotted rectangular in (c) highlights the residual unit. (d) The proposed dual path architecture, DPN. (e) An equivalent form of (d) from the perspective of implementation, where the symbol “o” denotes a split operation, and “+” denotes element-wise addition.

圖2：不同網路的架構比較。（a）殘差網。（b）稠密網，其中每個層都可以存取所有先前的微型塊的輸出。這裡，為了與（a）中的微塊設計保持一致，添加了一個 1 x 1卷積層（底線）。（c）通過在（b）中的微塊之間共享相同輸出的前 1 x 1連接，稠密網退化為殘差網。（c）中的虛線矩形突出顯示了殘差單位。（d）擬議的雙路徑架構 DPN。（e）從實現的角度來看，（d）的等效形式，其中符號“ o”表示拆分運算，而“ +”表示逐元素加法。

# DPN

a ResNet。

b DenseNet。

c 將 DenseNet 轉成殘差格式。

d DPN。

e 是 d 的等效形式。

-----

# CSPNet

說明：

部分進入 DenseBlock，部分跳過 DenseBlock。運算量因而減少。結果可能接近。

-----

# DenseNet

Huang, Gao, et al. "Densely connected convolutional networks." Proceedings of the IEEE conference on computer vision and pattern recognition. Vol. 1. No. 2. 2017.

http://openaccess.thecvf.com/content_cvpr_2017/papers/Huang_Densely_Connected_Convolutional_CVPR_2017_paper.pdf

# LeNet

LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.

http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf

# Hessian

Brown, David E. "The Hessian matrix: Eigenvalues, concavity, and curvature." BYU Idaho Department of Mathematics (2014).

https://www.iith.ac.in/~ashok/Maths_Lectures/TutorialB/Hessian_Examples.pdf

# ResNet-V。被引用 464 次。ensemble 促使損失函數平滑化，也因此好訓練。

Li, Hao, et al. "Visualizing the loss landscape of neural nets." Advances in Neural Information Processing Systems. 2018.

https://papers.nips.cc/paper/2018/file/a41b3bb3e6b050b6c9067c67f663b915-Paper.pdf

# DPN

Chen, Yunpeng, et al. "Dual path networks." Advances in Neural Information Processing Systems. 2017.

https://proceedings.neurips.cc/paper/2017/file/f7e0b956540676a129760a3eae309294-Paper.pdf

# CSPNet

Wang, Chien-Yao, et al. "CSPNet: A new backbone that can enhance learning capability of CNN." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. 2020.

https://openaccess.thecvf.com/content_CVPRW_2020/papers/w28/Wang_CSPNet_A_New_Backbone_That_Can_Enhance_Learning_Capability_of_CVPRW_2020_paper.pdf

# Convolution Guide

Dumoulin, Vincent, and Francesco Visin. "A guide to convolution arithmetic for deep learning." arXiv preprint arXiv:1603.

https://arxiv.org/pdf/1603.07285.pdf

-----

[1] DenseNet：比ResNet更優的CNN模型- 知乎

https://zhuanlan.zhihu.com/p/37189203

[2] DenseNet詳解

https://www.tensorinfinity.com/paper_89.html

[3] [線性系統] 對角化與 Eigenvalues and Eigenvectors

https://ch-hsieh.blogspot.com/2010/08/eigenvalues-and-eigenvectors.html\

[4] 半正定矩陣的判別方法 | 線代啟示錄