NIN(四):Appendix
2021/04/10
-----
-----
以下的圖片只翻譯圖片解說。
-----
Figure 3: Two examples of applying the parameterised sampling grid to an image U producing the output V . (a) The sampling grid is the regular grid G = TI (G), where I is the identity transformation parameters. (b) The sampling grid is the result of warping the regular grid with an affine transformation Tθ(G).
圖3:將參數化採樣網格應用於產生輸出 V 的圖像 U 的兩個示例。 (a)採樣網格是常規網格 G = TI(G),其中 I 是恆等轉換參數。 (b)採樣網格是通過仿射變換 Tθ(G)使規則網格變形的結果。
# STNet
說明:
訓練一個仿射矩陣用來轉換圖像。
-----
Figure 2: The architecture of a spatial transformer module. The input feature map U is passed to a localisation network which regresses the transformation parameters . The regular spatial grid G over V is transformed to the sampling grid Tθ(G), which is applied to U as described in Sect. 3.3, producing the warped output feature map V . The combination of the localisation network and sampling mechanism defines a spatial transformer.
圖2:空間轉換器模塊的架構。 輸入的特徵圖 U 傳遞到本地化網路,該網路對轉換參數進行回歸。 V 上的規則空間網格 G 轉換為採樣網格Tθ(G),如 Sect 3.3 中所述,將其應用於U。生成變形的輸出特徵圖 V。 定位網路和採樣機制的組合定義了一個空間變換器。
# STNet
https://hemingwang.blogspot.com/2020/04/stnet.html
-----
Figure 1: Left: an example shows the interaction between features and attention masks. Right: example images illustrating that different features have different corresponding attention masks in our network. The sky mask diminishes low-level background blue color features. The balloon instance mask highlights high-level balloon bottom part features.
圖1:左圖:特徵圖和注意力遮罩作用的一個例子。右:範例圖片展現了不同的特徵在我們的網路中具有不同的對應注意力遮罩。天空遮罩減少了低階背景的藍色特徵。氣球附件遮罩突出顯示了氣球底部的高階特徵。
# RANet
說明:
特徵與注意力遮罩。天空遮罩減少天空。氣球附件遮罩件突出氣球底部。
-----
Figure 2: Example architecture of the proposed network for ImageNet. We use three hyper-parameters for the design of Attention Module: p, t and r. The hyper-parameter p denotes the number of pre-processing Residual Units before splitting into trunk branch and mask branch. t denotes the number of Residual Units in trunk branch. r denotes the number of Residual Units between adjacent pooling layer in the mask branch. In our experiments, we use the following hyper-parameters setting: {p = 1, t = 2, r = 1}. The number of channels in the soft mask Residual Unit and corresponding trunk branches is the same.
圖2:為 ImageNet 提出的網路的範例架構。我們在注意力模組的設計中使用了三個超參數:p,t 和 r。超參數 p 表示在分解為主幹分支和遮罩分支之前的預處理殘差單元的數量。t 表示主幹分支中的殘差單元數量。r 表示遮罩分支中相鄰池化層之間的殘差單元數量。在我們的實驗中,我們使用以下超參數設置:{p = 1,t = 2,r = 1}。軟遮罩殘差單元和相應的主幹分支中的通道數相同。
# RANet
說明:
殘差單元數量。前後各一。主幹內二。軟遮罩內五。
軟遮罩:
-----
Figure 3: The receptive field comparison between mask branch and trunk branch.
圖3:遮罩分支和主幹分支之間的感受野比較。
# RANet
-----
Figure 1: BAM integrated with a general CNN architecture. As illustrated, BAM is placed at every bottleneck of the network. Interestingly, we observe multiple BAMs construct a hierarchical attention which is similar to a human perception procedure. BAM denoises low-level features such as background texture features at the early stage. BAM then gradually focuses on the exact target which is a high-level semantic. More visualizations and analysis are included in the supplementary material due to space constraints.
圖1:與通用 CNN 架構整合的 BAM。 如圖所示,BAM 放置在網絡的每個瓶頸處。有趣的是,我們觀察到多個 BAM 構造了類似於人類感知程序的層次化注意力。在早期,BAM 對低階特徵(例如背景紋理特徵)進行降噪。然後,BAM 逐漸將重點放在確切的目標上,這是一個高階語義。由於篇幅所限,補充材料中包含了更多的可視化和分析功能。
# BAM
-----
Figure 2: Detailed module architecture. Given the intermediate feature map F, the module computes corresponding attention map M(F) through the two separate attention branches – channel Mc and spatial Ms. We have two hyper-parameters for the module: dilation value (d) and reduction ratio (r). The dilation value determines the size of receptive fields which is helpful for the contextual information aggregation at the spatial branch. The reduction ratio controls the capacity and overhead in both attention branches. Through the experimental validation (see Sec. 4.1), we set {d = 4, r = 16}.
圖2:模組架構細節。 給定中間特徵圖 F,模組將通過兩個單獨的注意力分支-通道 Mc 和空間 Ms 計算相應的注意力圖 M(F)。我們為模組提供了兩個超參數:膨脹值(d)和縮減率(r)。膨脹值確定了感受野的大小,這有助於空間分支處的上下文信息聚合。減少比率控制兩個注意分支的容量和開銷。通過實驗驗證(請參見第4.1節),我們設置 {d = 4,r = 16}。
# BAM
-----
Fig. 1: The overview of CBAM. The module has two sequential sub-modules: channel and spatial. The intermediate feature map is adaptively refined through our module (CBAM) at every convolutional block of deep networks.
圖1:CBAM 概述。 該模塊具有兩個連續的子模塊:通道和空間。 中間特徵圖通過我們的模塊(CBAM)在深度網路的每個卷積塊上進行自適應調整。
# CBAM
-----
Fig. 2: Diagram of each attention sub-module. As illustrated, the channel sub-module utilizes both max-pooling outputs and average-pooling outputs with a shared network; the spatial sub-module utilizes similar two outputs that are pooled along the channel axis and forward them to a convolution layer.
圖2:每個注意次模組的示意圖。 如圖所示,通道次模組通過共享網路使用最大池化輸出和平均池化輸出。 空間次模組利用沿通道軸合併的相似兩個輸出,並將它們轉發到卷積層。
# CBAM
-----
Fig. 3: CBAM integrated with a ResBlock in ResNet[5]. This figure shows the exact position of our module when integrated within a ResBlock. We apply CBAM on the convolution outputs in each block.
圖3:在 ResNet 中整合了 ResBlock 的 CBAM [5]。 此圖顯示了將模塊整合到 ResBlock 中時的確切位置。 我們在每個塊的卷積輸出上應用 CBAM。
# CBAM
-----
Figure 1. Pipeline of RASNet. The RASNet is constituted by a shared feature extractor, attention mechanisms (general attention, residual attention, channel attention), and the weighted cross correlation layer (WXCorr). When a pair of an exemplar and a search image flows into the net, feature maps are produced through the feature extractor. Based on the exemplar features, three types of attentions are extracted. Exemplar and search features, along with the attentions as weights are inputed to WXCorr and finally transformed to a response map.
圖1. RASNet 的流程。 RASNet 由共享特徵提取器,注意力機制(一般注意力,殘差注意力,通道注意力)和加權互相關層(WXCorr)組成。當一對範例和搜索圖像流入網路時,將通過特徵提取器生成特徵圖。基於範例特徵,提取了三種類型的注意力。 範例和搜尋的特徵,以及權重的關注點都輸入到 WXCorr 中,並最終轉換為響應圖。
# RASNet
-----
Figure 2. An example of feature producing in Siamese network. The green and blue boxes in the cubic contain feature maps for the corresponding green and blue windows.
圖2. 孿生網路中的特徵生成範例。 立方體中的綠色和藍色框包含對應的綠色和藍色窗口的特徵圖。
# RASNet
-----
Figure 3. Dual attention. It is an annexation of the general attention and the residual attention responded through an encoding-decoding net that improves the attention near object boundaries.
圖3. 雙重注意力。 它是對一般注意力的兼併,而殘差注意力則通過編碼-解碼網路來響應,該網路改善了對象邊界附近的注意力。
# RASNet
-----
Figure 4. Illustration on training pair selection for Siamese network. Eight frames are exhibited to represent frames of a sequence. For a typical Siamese network, a training pair is consisted by randomly selected two frames. Thus, (#1, #4) pair is completely possible to be chosen, which can result in over-fitting.
圖4. 孿生網路對訓練對選擇的圖解。用八張畫格簡單代表一串畫格。對於典型的孿生網路,訓練對由隨機選擇的兩張畫格組成。因此,完全有可能選擇(#1,#4)對,這可能導致過度擬合。
# RASNet
-----
Figure 6. Visualizations on general attention learning and dual attention results.
圖6. 關於一般注意力學習和雙重注意力結果的可視化。
# RASNet
-----
Figure 1: An overview of Neural Architecture Search.
圖1:神經架構搜索概觀。
# NAS_RL
-----
Figure 2: How our controller recurrent neural network samples a simple convolutional network. It predicts filter height, filter width, stride height, stride width, and number of filters for one layer and repeats. Every prediction is carried out by a softmax classifier and then fed into the next time step as input.
圖2:我們的控制器遞歸神經網路如何對簡單的卷積網路進行採樣。 它可以預測一層和重複的濾波器高度,濾波器寬度,步幅高度,步幅寬度以及濾波器數量。 每個預測都是由 softmax 分類器執行的,然後作為輸入輸入到下一個時間步中。
# NAS_RL
-----
The list of tokens that the controller predicts can be viewed as a list of actions a1:T to design an architecture for a child network. At convergence, this child network will achieve an accuracy R on a held-out dataset. We can use this accuracy R as the reward signal and use reinforcement learning to train the controller. More concretely, to find the optimal architecture, we ask our controller to maximize its expected reward, represented by J(θc):
控制器預測的令牌列表可以視為操作 a1:T 的列表,以設計子網路的架構。 收斂時,該子網路將在保留的資料集上達到精度 R。 我們可以將此精度 R 用作獎勵信號,並使用強化學習來訓練控制器。 更具體地說,為了找到最佳架構,我們要求控制器最大化其期望的回報,用 J(θc)表示:
Since the reward signal R is non-differentiable, we need to use a policy gradient method to iteratively update θc. In this work, we use the REINFORCE rule from Williams (1992):
由於獎勵信號 R 是不可微的,因此我們需要使用策略梯度方法來迭代更新 θc。 在這項工作中,我們使用 Williams(1992)的 REINFORCE規則:
An empirical approximation of the above quantity is:
上述數量的實驗近似值為:
Where m is the number of different architectures that the controller samples in one batch and T is the number of hyperparameters our controller has to predict to design a neural network architecture.
其中 m 是控制器在一批中採樣的不同架構的數量,T 是控制器為設計神經網路架構而必須預測的超參數的數量。
# NAS_RL
θ
「第 1 行構造一個策略模型並隨機的初始化模型的參數。模型的功能是通過前向反饋由狀態信息計算出所有動作的概率分佈,例如(向上90%,向下10%),並選取概率最大 的動作發給遊戲作為指令。」(下方連結已失效)。
-----
The validation accuracy that the k-th neural network architecture achieves after being trained on a training dataset is Rk.
在訓練資料集上訓練後,第 k 個神經網路架構達到的驗證精度為 Rk。
The above update is an unbiased estimate for our gradient, but has a very high variance. In order to reduce the variance of this estimate we employ a baseline function:
上面的更新是對我們的梯度的無偏估計,但是變化非常大。 為了減少此估計的方差,我們採用了基線函數:
As long as the baseline function b does not depend on the on the current action, then this is still an unbiased gradient estimate. In this work, our baseline b is an exponential moving average of the previous architecture accuracies.
只要基線函數 b 不依賴於當前動作,那麼這仍然是無偏梯度估計。 在這項工作中,我們的基準 b 是先前架構精度的指數移動平均值。
# NAS_RL
-----
MC 加上 PG,就是 REINFORCE。
TD 加上 PG,就是 Actor-Critic。
REINFORCE
這個演算法,其實是梯度上升法,一行更新公式而已,跟 LeNet 的梯度下降法,概念上類似。Loss Function 求 error 的最小值,Objective Function 求 reward 的最大值。
梯度上升法公式裡面,會有一個常數。Gt 是放大係數。梯度主要表示方向,如果提高 action 的機率,會讓 reward 增加,那就提高這個 action 的機率。
REINFORCEwb
減去一個 baseline 的原因是要消除隨機取樣的副作用。原來的演算法,因為 Gt 恆為正,reward 值較小的 action 機率也會被提升。假設減去平均值,那 R 值大的動作的機率會被提升,R 值小的動作的機率的值會被抑制。
https://medium.com/@fork.tree.ai/understanding-baseline-techniques-for-reinforce-53a1e2279b57
-----
Figure 3: Distributed training for Neural Architecture Search. We use a set of S parameter servers to store and send parameters to K controller replicas. Each controller replica then samples m architectures and run the multiple child models in parallel. The accuracy of each child model is recorded to compute the gradients with respect to c, which are then sent back to the parameter servers.
圖3:神經架構搜索的分散式訓練。 我們使用一組 S 參數服務器來存儲參數並將其發送到 K 個控制器副本。 然後,每個控制器副本都會對 m 種架構進行採樣,並並行運行多個子模型。 記錄每個子模型的準確性,以計算相對於 θc 的梯度,然後將其發送回參數服務器。
# NAS_RL
-----
Figure 4: The controller uses anchor points, and set-selection attention to form skip connections.
圖4:控制器使用定位點和設置選擇注意來形成跳過連接。
# NAS_RL
-----
Figure 5: An example of a recurrent cell constructed from a tree that has two leaf nodes (base 2) and one internal node. Left: the tree that defines the computation steps to be predicted by controller. Center: an example set of predictions made by the controller for each computation step in the tree. Right: the computation graph of the recurrent cell constructed from example predictions of the controller.
圖5:由一棵樹構造的循環單元的示例,該樹具有兩個葉節點(基數2)和一個內部節點。 左:定義控制器要預測的計算步驟的樹。 中心:控制器針對樹中每個計算步驟做出的一組示例預測。 右:根據控制器的示例預測構建的循環單元格的計算圖。
# NAS_RL
-----
References
-----
# spatial domain
// STNet
Jaderberg, Max, et al. "Spatial transformer networks." arXiv preprint arXiv:1506.02025 (2015).
https://arxiv.org/pdf/1506.02025.pdf
// RANet
Wang, Fei, et al. "Residual attention network for image classification." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
// BAM
Park, Jongchan, et al. "Bam: Bottleneck attention module." arXiv preprint arXiv:1807.06514 (2018).
https://arxiv.org/pdf/1807.06514.pdf
// CBAM
Woo, Sanghyun, et al. "Cbam: Convolutional block attention module." Proceedings of the European conference on computer vision (ECCV). 2018.
// RASNet
Wang, Qiang, et al. "Learning attentions: residual attentional siamese network for high performance online visual tracking." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
# NAS_RL
Zoph, Barret, and Quoc V. Le. "Neural architecture search with reinforcement learning." arXiv preprint arXiv:1611.01578 (2016).
https://arxiv.org/pdf/1611.01578.pdf
-----
以下只列出論文
-----
# pNASNet
Liu, Chenxi, et al. "Progressive neural architecture search." Proceedings of the European conference on computer vision (ECCV). 2018.
# AmoebaNet
Real, Esteban, et al. "Regularized evolution for image classifier architecture search." Proceedings of the aaai conference on artificial intelligence. Vol. 33. No. 01. 2019.
https://ojs.aaai.org/index.php/AAAI/article/download/4405/4283
# mNASNet
Tan, Mingxing, et al. "Mnasnet: Platform-aware neural architecture search for mobile." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019.
# Efficientnet
Tan, Mingxing, and Quoc Le. "Efficientnet: Rethinking model scaling for convolutional neural networks." International Conference on Machine Learning. PMLR, 2019.
http://proceedings.mlr.press/v97/tan19a/tan19a.pdf
# Searching for mobilenetv3
Howard, Andrew, et al. "Searching for mobilenetv3." Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019.
-----
No comments:
Post a Comment