Thursday, July 02, 2020

Hard Examples - 40P

Hard Examples - 40P

-----

Object Detection Part V:Hard Examples - 40P

2020/07/05

第三小時之後的延伸閱讀三

尹相志:

『GIOU 以及 CIOU 雖然解決了未重疊的框,但是在最佳化的過程中,畢竟那個框那麼遠,事實上應該他遙遠框的 objectness 應該要被抑制,而強化正確位置,GIOU 與 CIOU 由於可以是負值,所以能夠抑制遙遠框,但是引導產生正確框並不是單純透過變動中心偏移與長寬偏移就能做到的,因為只要中心點座標偏移變動到一定程度它就必須要換一個特徵圖格子,這裡會產生不連續的斷片,以及圖片中無要被檢測物體這種特殊案例,也都容易造成梯度的斷裂,所以 IOU 系列 LOSS 其實很難訓練,但一般看到對於 GIOU 與 CIOU 的稱讚,其實都是基於相對於 IOU 的比較結果。現行 yolo v4 是用 CIOU loss 替代了原有基於正負樣本的邊界框回歸(如何決定正負樣本的相對權重就很頭痛,用論文預設未必是最好的),我自己的經驗是除了 CIOU 再疊加 ssd 版的邊界框回歸損失(利用 hard negative mining 使用所有正例以及梯度影響最大的部分負例),就緩解 IOU 系列 LOSS 難訓練的問題。』

-----

一、Hard Paper

依照時間順序的論文排序,大部分在 PISA 論文中有介紹。GHM 是 Focal Loss 的延伸。

DCR 之前可以補充 ION、R-FCN、DilatedNet、STNet、DCN v1、DCN v2。

-----

二、DPM

Hard examples 在 DPM 即有討論。

https://lilianweng.github.io/lil-log/2017/12/15/object-recognition-for-dummies-part-2.html

-----

三、Fast R-CNN

OHEM 在 Fast R-CNN 的基礎之上討論。

https://zhuanlan.zhihu.com/p/31426458

-----

四、OHEM

前向先綠再紅,反向只有紅。

『摘要主要讲了四点:(1) 训练过程需要进行参数的空间搜索。(2) 简单样本与难分辨样本之间的类别不平衡是亟需解决的问题。(3) 自动地选择难分辨样本来进行训练不仅效率高而且性能好。(4) 提出了 OHEM 算法,不仅效率高而且性能好,在各种数据集上表现优越。』

https://zhuanlan.zhihu.com/p/58162337

-----

五、Hard Negative

Easy Positive 與 Easy Negative 沒什麼好說。Hard Positive 包含目標物的部分,不是很好辨認。Hard Negative 包含目標物的一小塊,有可能被誤認為物體。Ground Truth 是要偵測的目標。

https://zhuanlan.zhihu.com/p/65584372

-----

六、Positive and Negative

Many negative samples。Few positive samples。

https://towardsdatascience.com/review-retinanet-focal-loss-object-detection-38fba6afabe4

-----

七、Cross Entropy

pt 的定義。

pt 是 p,也就是一個 0 到 1 的機率值,限制在預測正確,也就是 y = 1,也就是預測為 ground truth 的機率。也就是我們在 YOLO 框框看到的百分比。以結果論,這個百分比越高越好,所以值越大, loss 越小。

為什麼有負號,因為 entropy 定義為機率倒數取對數,倒數的分母提出去就是一個負號。如果不管 entropy,直接考慮一個機率值的對數,這個機率值一定小於 1,所以其對數值一定小於 0,所以定義為 loss 時加一個負號。所以公式 CE(pt)=-log(pt)。

由於 pt 小於 1,pt 大一點時, 1 - pt 就小一點並小於 1,加上乘冪又更小。所以有 focal loss 的公式。容易分類的樣本被降低權重後,不容易分類的樣本的影響力就提高了。

-----

八、Focal Loss

pt 大於 0.5 是容易分類的,把它的權重降低。不容易分類樣本的權重,自然提高。

『Focal loss 是一种改进了的交叉熵 (cross-entropy, CE) loss,它通过在原有的 CE loss 上乘了个使易检测目标对模型训练贡献削弱的指数式,从而使得 Focal loss 成功地解决了在目标检测时,正负样本区域极不平衡而目标检测 loss 易被大批量负样本所左右的问题。』

https://zhuanlan.zhihu.com/p/41849687

-----

九、FL 解決的問題

Alpha 解決正負樣本不平衡的問題。
Gamma 解決難易樣本不平衡的問題。

https://zhuanlan.zhihu.com/p/80594704

-----

十、GHM

『GHM 的想法是,我们确实不应该过多关注易分样本,但是特别难分的样本(outliers,离群点)也不该关注啊!这些离群点的梯度模长 d 要比一般的样本大很多,如果模型被迫去关注这些样本,反而有可能降低模型的准确度!况且,这些样本的数量也很多!』

https://zhuanlan.zhihu.com/p/80594704

-----

一一、Outlier

『图中采用对数坐标来更清楚地展示分布。此外,还可以发现在 g 接近 1 的时候,样本比例也相对较大,研究者认为这是一些离群样本(outlier),可能是由于数据标注本身不够准确或是样本比较特殊极难学习而造成的。对一个已收敛的模型来说,强行学好这些离群样本可能会导致模型参数的较大偏差,反而会影响大多数已经可以较好识别的样本的判断准确率。』

https://zhuanlan.zhihu.com/p/55017036

-----

一二、norm of gradient

論文裡面有清楚的定義。

『Gradient norm scaling involves changing the derivatives of the loss function to have a given vector norm when the L2 vector norm (sum of the squared values) of the gradient vector exceeds a threshold value.

For example, we could specify a norm of 1.0, meaning that if the vector norm for a gradient exceeds 1.0, then the values in the vector will be rescaled so that the norm of the vector equals 1.0.』

https://machinelearningmastery.com/how-to-avoid-exploding-gradients-in-neural-networks-with-gradient-clipping/

-----

一三、Gradient Density and GHM-C Loss

『那怎么同时衰减易分样本和特别难分的样本呢?太简单了,谁的数量多衰减谁呗!那怎么衰减数量多的呢?简单啊,定义一个变量,让这个变量能衡量出一定梯度范围内的样本数量——这不就是物理上密度的概念吗?于是,作者定义了梯度密度。』

『梯度密度 [公式] 的物理含义是:单位梯度模长 g 部分的样本个数。接下来就简单了,对于每个样本,把交叉熵 CE × 该样本梯度密度的倒数即可!』

https://zhuanlan.zhihu.com/p/80594704

-----

一四、Libra R-CNN

三個層級的不平衡問題。

『论文指出目前的检测过程存在三个平衡的问题,sample level,feature level 和 objective level,进而针对这三个层次,提出了三个问题:1)如何选择具有代表性的 region proposal(这个问题很多论文都探讨过,可以认为是样本(proposal)的不平衡问题,2019 AAAI GHM 针对的 one stage 的样本不平衡问题。)2)不同 level 的特征如何融合才能真正地充分利用?(特征融合问题)3)损失函数的设计能不能引导目标检测器更好地收敛?』

https://zhuanlan.zhihu.com/p/62534549

-----

一五、Libra R-CNN Architecture

三個不平衡問題的解法。

『Libra R-CNN 整体结构,目标是消除目标检测训练过程中存在的不平衡问题。IoU-balanced Sampling: 本文首先提出了一个问题:训练样本及对应 ground truth 的重叠度IoU是否与样本的 difficulty (这里类比于 easy sample, hard sample) 相关。』

https://www.cnblogs.com/fourmi/p/10756556.html

-----

一六、IoU Distribution

IoU-balanced Sampling,與 OHEM 一樣屬於 Hard Sampling。

『主要考虑了hard sample,发现超过60%的hard negatives的IoU超过0.05,但基于随机采样只得到30%左右的训练样本。极度的不平衡导致许多hard samples被淹没在数以万计的easy样本中。为此,提出了IoU-balanced sampling:在不增加额外计算量的基础的简单有效的mining 方法。』

https://www.cnblogs.com/fourmi/p/10756556.html

-----

一七、Balanced Feature Pyramid

分成四個步驟:rescaling、integrating、refining、strengthening。

https://zhuanlan.zhihu.com/p/64541760

-----

一八、Balanced L1 Loss

『Fast R-CNN 中是通过multi-task loss解决Classification(分类)和Localization(定位)的问题的。』

『之所以会提出Balanced L1 loss,是因为这个损失函数是两个loss的相加,如果分类做得很好地话一样会得到很高的分数,而导致忽略了回归的重要性,一个自然的想法就是调整 lambda 的值。』

https://zhuanlan.zhihu.com/p/64541760

-----

一九、ION

重點在 IRNN。

-----

二十、IRNN

重點在四個方向。

----

廿一、四個方向

導出 R-FCN 的九個位置。

-----

廿二、R-FCN

九個位置。

-----

廿三、DCN v1

不同的卷積。

-----

廿四、DilatedNet

擴張卷積。

-----

廿五、STNet

仿射變換。

-----

廿六、DCN v1

-----

廿七、DCN v1

-----

廿八、DCN v2

類似 DCR v1。

-----

廿九、DCR v1

專注 False Positive。

-----

三十、DCR v2

降低 v1 的運算量,類似 R-CNN 到 Fast R-CNN 的改進。

-----

卅一、PISA

紅框是 prime sample,藍框是 hard sample。本論文說明 prime sample 才是重點。右圖顯示這是對的。

-----

卅二、Precision-Recall

實線是 baseline,點線是 top25,虛線是 top5。

-----

卅三、Prime Sample Attention

先把同一個 ground truth 的建議框排序。譬如綠黃藍。同一個名次再依分數排序。IoU 大小跟預測分數高低,各做一份。

-----

卅四、Pos-Prime

IoU 大者錯誤低。

-----

卅五、Imbalance

Hard Sampling:Faster R-CNN、OHEM、Libra R-CNN。
Soft Sampling:Focal Loss、GHM、PISA。
Ranking-Based Loss:AP Loss、DR Loss。
Unbiased Learning:ResObj、Sampling-Free。

https://arxiv.org/abs/2006.09238

-----

卅六、 AP Loss

https://zhuanlan.zhihu.com/p/101303119

https://blog.csdn.net/jiaoyangwm/article/details/91479594

-----

卅七、DR Loss

https://zhuanlan.zhihu.com/p/75896297

-----

卅八、ResObj

https://zhuanlan.zhihu.com/p/82761345

-----

卅九、Sampling-Free

https://zhuanlan.zhihu.com/p/100052168

-----

四十、Imbalance Problems

分四大類。

https://arxiv.org/abs/1909.00169

-----

原則上圖片若來自網路會提供來源,圖片若來自論文請自行參閱。

-----

Wednesday, July 01, 2020

Learning English

Learning English

-----

如何學好英文

本篇文章本來列為 AI 從頭學的附錄,之前有一個簡易的版本發表在 AI Seminar Taiwan,不過一直找不到時間重寫。最近我在 Python 發表 AI 從頭學的不出版說明,有讀者希望看到這篇附錄,所以我重新寫一下,很簡要,不會太長。

-----

老外學好母語的順序是聽說讀寫,老中應該是聽讀說寫。我們假定一般台灣的大學生讀馬馬虎虎,聽大多不行。

有兩本書的觀念值得推薦,成寒的躺著學英文跟鄭贊容的千萬別學英語。兩本書看不看都沒關係。前者強調聽力的重要性,後者提醒要使用英英辭典。其實是同一件事,要用英語思考。哪個先,都可以。

我覺得有一些世界文學名著的簡易版本,附有一張 CD 者,適合用來訓練聽力,可以選一些單字都懂的書,試著聽懂。我在交大時,把圖書館所有聽力教材掃過一輪,包含哈利波特。

讀的部分,一樣可以從簡單的英文開始讀,最好不要查字典,要點是可以一直讀,所以太難的不適合,有趣的比較適合。

有種叫做學習型辭典的英英辭典,專門給英語非母語人士,裡面用的英文單字大約兩三千個,真要使用,可以用一下,但還是儘量用猜的,真要查,就等你不是很確定單字的意義時再查。不要動不動查字典。

-----

總之,重點是一直聽,一直讀。更重要的重點是,要有一點點難,但是不要太難,讓你可以持續一直做比較重要。

簡單寫一下,希望對讀到這篇短文的人有點幫助。

-----

Monday, June 29, 2020

Deep Learning Highlight

Deep Learning Highlight

2019/04/25

說明:

這是依照我自學深度學習進度推出的入門建議。

分別有:三篇快速版,可以「快速」一窺深度學習全貌。二十篇慢速版,以電腦視覺為主。三十篇基礎版,新增基礎主題與自然語言處理。十篇精華版,為三十篇基礎版的精華。五十篇完整版,涵蓋上述所有提到的論文。

-----


Fig. 1. 深度學習: Caffe 之經典模型詳解與實戰 [1]。

-----

三篇快速版

1. Deep Learning
2. LeNet
3. LSTM

-----

二十篇慢速版

1. LeNet
2. LSTM

3. AlexNet
4. ZFNet
5. NIN
6. GoogLeNet
7. VGGNet
8. SqueezeNet

9. PreVGGNet
10. SVM
11. SMO
12. DPM
13. SS
14. FCN

15. R-CNN
16. SPPNet
17. Fast R-CNN
18. Faster R-CNN
19. YOLO
20. SSD

-----

三十篇基礎版

1. LeNet(AlexNet、ZFNet)
2. NIN + SENet(GoogLeNet、VGGNet、PreVGGNet、Highway)
3. ResNet
4. FCN(Mask R-CNN、YOLACT)
5. YOLOv1(Faster R-CNN、YOLOv3)

6. LSTM(Weight Decay、Dropout)
7. Seq2seq(Batch Normalization、Layer Normalization)
8. Attention(RAdam、Lookahead)
9. ConvS2S(ULMFiT、ELMo)
10. Transformer(GPT-1、BERT)

-----

十篇精華版

1. LeNet
2. NIN
3. ResNet
4. FCN
5. YOLOv1

6. LSTM
7. Seq2seq
8. Attention
9. ConvS2S
10. Transformer

-----

五十篇完整版

CNN 9 (LeNet、AlexNet、ZFNet、NIN、GoogLeNet、VGGNet、PreVGGNet、Highway、ResNet)

Semantic Segmentation 4(FCN、U-Net、Deeplabv3+)、(Mask R-CNN)

Object Detection 14(DPM、SS、R-CNN、SPPNet、Fast R-CNN、Faster R-CNN)、(YOLOv1)、(SSD、R-FCN、YOLOv2、FPN、RetinaNet、YOLOv3、M2Det)

Optimization 6(SGD、Momentum、NAG、AdaGrad、AdaDelta、RMSProp、Adam)

Regularization 2(Weight Decay、Dropout)

Normalization 5(Batch、Weight、Layer、Instance、Group)

NLP 10(LSTM、NNLM、Word2vec、Seq2seq、Attention、ConvS2S、Transformer、ELMo、GPT、BERT)

-----

符號說明:

# basic
// advanced

-----

Paper

# Deep Learning
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." nature 521.7553 (2015): 436.
https://creativecoding.soe.ucsc.edu/courses/cs523/slides/week3/DeepLearning_LeCun.pdf

# History of Deep Learning
Alom, Md Zahangir, et al. "The history began from alexnet: A comprehensive survey on deep learning approaches." arXiv preprint arXiv:1803.01164 (2018).
https://arxiv.org/ftp/arxiv/papers/1803/1803.01164.pdf

# Recent Advances in CNN
Gu, Jiuxiang, et al. "Recent advances in convolutional neural networks." Pattern Recognition 77 (2018): 354-377.
https://arxiv.org/pdf/1512.07108.pdf

// GPU
Raina, Rajat, Anand Madhavan, and Andrew Y. Ng. "Large-scale deep unsupervised learning using graphics processors." Proceedings of the 26th annual international conference on machine learning. ACM, 2009.
http://robotics.stanford.edu/~ang/papers/icml09-LargeScaleUnsupervisedDeepLearningGPU.pdf

// Difficult 1994
Bengio, Yoshua, Patrice Simard, and Paolo Frasconi. "Learning long-term dependencies with gradient descent is difficult." IEEE transactions on neural networks 5.2 (1994): 157-166.
https://pdfs.semanticscholar.org/d0be/39ee052d246ae99c082a565aba25b811be2d.pdf

// Difficult 2010
Glorot, Xavier, and Yoshua Bengio. "Understanding the difficulty of training deep feedforward neural networks." Proceedings of the thirteenth international conference on artificial intelligence and statistics. 2010.
http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf 

// Difficult 2013
Pascanu, Razvan, Tomas Mikolov, and Yoshua Bengio. "On the difficulty of training recurrent neural networks." International conference on machine learning. 2013.
http://proceedings.mlr.press/v28/pascanu13.pdf
 
-----

Part I:Computer Vision

-----

◎ Image Classicification

-----

# LeNet
LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.
http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf

# AlexNet
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

# ZFNet
Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." European conference on computer vision. Springer, Cham, 2014.
https://arxiv.org/pdf/1311.2901.pdf

-----

# NIN
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." arXiv preprint arXiv:1312.4400 (2013).
https://arxiv.org/pdf/1312.4400.pdf

# SENet
Hu, Jie, Li Shen, and Gang Sun. "Squeeze-and-excitation networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
http://openaccess.thecvf.com/content_cvpr_2018/papers/Hu_Squeeze-and-Excitation_Networks_CVPR_2018_paper.pdf

# GoogLeNet
Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
http://openaccess.thecvf.com/content_cvpr_2015/papers/Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf

# VGGNet
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
https://arxiv.org/pdf/1409.1556/

# PreVGGNet
Ciresan, Dan C., et al. "Flexible, high performance convolutional neural networks for image classification." IJCAI Proceedings-International Joint Conference on Artificial Intelligence. Vol. 22. No. 1. 2011.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.481.4406&rep=rep1&type=pdf
  
# Highway v1
Srivastava, Rupesh Kumar, Klaus Greff, and Jürgen Schmidhuber. "Highway networks." arXiv preprint arXiv:1505.00387 (2015).
https://arxiv.org/pdf/1505.00387.pdf

# Highway v2
Srivastava, Rupesh K., Klaus Greff, and Jürgen Schmidhuber. "Training very deep networks." Advances in neural information processing systems. 2015.
https://papers.nips.cc/paper/5850-training-very-deep-networks.pdf

# Inception v3
Szegedy, Christian, et al. "Rethinking the inception architecture for computer vision." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Szegedy_Rethinking_the_Inception_CVPR_2016_paper.pdf

# Inception v4
Szegedy, Christian, et al. "Inception-v4, inception-resnet and the impact of residual connections on learning." AAAI. Vol. 4. 2017.
http://www.aaai.org/ocs/index.php/AAAI/AAAI17/paper/download/14806/14311

# FractalNet(DropPath)
Larsson, Gustav, Michael Maire, and Gregory Shakhnarovich. "Fractalnet: Ultra-deep neural networks without residuals." arXiv preprint arXiv:1605.07648 (2016).
https://arxiv.org/pdf/1605.07648.pdf

# CapsNet
Sabour, Sara, Nicholas Frosst, and Geoffrey E. Hinton. "Dynamic routing between capsules." Advances in neural information processing systems. 2017.
http://papers.nips.cc/paper/6975-dynamic-routing-between-capsules.pdf

# PolyNet
Zhang, Xingcheng, et al. "Polynet: A pursuit of structural diversity in very deep networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.
http://openaccess.thecvf.com/content_cvpr_2017/papers/Zhang_PolyNet_A_Pursuit_CVPR_2017_paper.pdf

-----
 
# ResNet
He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
http://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf

# ResNet v2
He, Kaiming, et al. "Identity mappings in deep residual networks." European Conference on Computer Vision. Springer, Cham, 2016.
https://arxiv.org/pdf/1603.05027.pdf  

# ResNet-D
Huang, Gao, et al. "Deep networks with stochastic depth." European conference on computer vision. Springer, Cham, 2016.
https://arxiv.org/pdf/1603.09382.pdf

# ResNet-E
Veit, Andreas, Michael J. Wilber, and Serge Belongie. "Residual networks behave like ensembles of relatively shallow networks." Advances in neural information processing systems. 2016.
https://papers.nips.cc/paper/6556-residual-networks-behave-like-ensembles-of-relatively-shallow-networks.pdf 

# ResNet-F
Zhang, Hongyi, Yann N. Dauphin, and Tengyu Ma. "Fixup initialization: Residual learning without normalization." arXiv preprint arXiv:1901.09321 (2019).
https://arxiv.org/pdf/1901.09321.pdf

# ResNet-I
Shwartz-Ziv, Ravid, and Naftali Tishby. "Opening the black box of deep neural networks via information." arXiv preprint arXiv:1703.00810 (2017).
https://arxiv.org/pdf/1703.00810.pdf

# ResNet-Q
Balduzzi, David, et al. "The shattered gradients problem: If resnets are the answer, then what is the question?." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017.
https://arxiv.org/pdf/1702.08591.pdf

# ResNet-S
Orhan, A. Emin, and Xaq Pitkow. "Skip connections eliminate singularities." arXiv preprint arXiv:1701.09175 (2017).
https://arxiv.org/pdf/1701.09175.pdf 

# ResNet-U
Liu, Tianyi, et al. "Towards Understanding the Importance of Shortcut Connections in Residual Networks." Advances in Neural Information Processing Systems. 2019.
http://papers.nips.cc/paper/9003-towards-understanding-the-importance-of-shortcut-connections-in-residual-networks.pdf 

# ResNet-V
Li, Hao, et al. "Visualizing the loss landscape of neural nets." Advances in Neural Information Processing Systems. 2018.
https://papers.nips.cc/paper/7875-visualizing-the-loss-landscape-of-neural-nets.pdf

# ResNet-W
He, Fengxiang, Tongliang Liu, and Dacheng Tao. "Why resnet works? residuals generalize." arXiv preprint arXiv:1904.01367 (2019).
https://arxiv.org/pdf/1904.01367.pdf 

# WRN
Zagoruyko, Sergey, and Nikos Komodakis. "Wide residual networks." arXiv preprint arXiv:1605.07146 (2016).
https://arxiv.org/pdf/1605.07146.pdf

# ResNeXt
Xie, Saining, et al. "Aggregated residual transformations for deep neural networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
http://openaccess.thecvf.com/content_cvpr_2017/papers/Xie_Aggregated_Residual_Transformations_CVPR_2017_paper.pdf 

# DenseNet
Huang, Gao, et al. "Densely connected convolutional networks." Proceedings of the IEEE conference on computer vision and pattern recognition. Vol. 1. No. 2. 2017.
http://openaccess.thecvf.com/content_cvpr_2017/papers/Huang_Densely_Connected_Convolutional_CVPR_2017_paper.pdf

# DPN
Chen, Yunpeng, et al. "Dual path networks." Advances in Neural Information Processing Systems. 2017.
https://papers.nips.cc/paper/7033-dual-path-networks.pdf

# DLA
Yu, Fisher, et al. "Deep layer aggregation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
http://openaccess.thecvf.com/content_cvpr_2018/papers/Yu_Deep_Layer_Aggregation_CVPR_2018_paper.pdf

# Res2Net
Gao, Shang-Hua, et al. "Res2Net: A New Multi-scale Backbone Architecture." arXiv preprint arXiv:1904.01169 (2019).
https://arxiv.org/pdf/1904.01169.pdf 

-----

Mobile

-----

# SqueezeNet
Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size." arXiv preprint arXiv:1602.07360 (2016).
https://arxiv.org/pdf/1602.07360.pdf

# MobileNet v1
Howard, Andrew G., et al. "Mobilenets: Efficient convolutional neural networks for mobile vision applications." arXiv preprint arXiv:1704.04861 (2017).
https://arxiv.org/pdf/1704.04861.pdf

# MobileNet v2
Sandler, Mark, et al. "Mobilenetv2: Inverted residuals and linear bottlenecks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
http://openaccess.thecvf.com/content_cvpr_2018/papers/Sandler_MobileNetV2_Inverted_Residuals_CVPR_2018_paper.pdf

# MobileNet v3
Howard, Andrew, et al. "Searching for mobilenetv3." arXiv preprint arXiv:1905.02244 (2019).
https://arxiv.org/pdf/1905.02244.pdf

# ShuffleNet v1
Zhang, Xiangyu, et al. "Shufflenet: An extremely efficient convolutional neural network for mobile devices." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
http://openaccess.thecvf.com/content_cvpr_2018/papers/Zhang_ShuffleNet_An_Extremely_CVPR_2018_paper.pdf

# ShuffleNet v2
Ma, Ningning, et al. "Shufflenet v2: Practical guidelines for efficient cnn architecture design." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
http://openaccess.thecvf.com/content_ECCV_2018/papers/Ningning_Light-weight_CNN_Architecture_ECCV_2018_paper.pdf

# Xception
Chollet, François. "Xception: Deep learning with depthwise separable convolutions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
http://openaccess.thecvf.com/content_cvpr_2017/papers/Chollet_Xception_Deep_Learning_CVPR_2017_paper.pdf

# ESPNet v1
Mehta, Sachin, et al. "Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
http://openaccess.thecvf.com/content_ECCV_2018/papers/Sachin_Mehta_ESPNet_Efficient_Spatial_ECCV_2018_paper.pdf

# ESPNet v2
Mehta, Sachin, et al. "Espnetv2: A light-weight, power efficient, and general purpose convolutional neural network." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
http://openaccess.thecvf.com/content_CVPR_2019/papers/Mehta_ESPNetv2_A_Light-Weight_Power_Efficient_and_General_Purpose_Convolutional_Neural_CVPR_2019_paper.pdf

-----

# NAS-RL
Zoph, Barret, and Quoc V. Le. "Neural architecture search with reinforcement learning." arXiv preprint arXiv:1611.01578 (2016).
https://arxiv.org/pdf/1611.01578.pdf

# NASNet(Scheduled DropPath)
Zoph, Barret, et al. "Learning transferable architectures for scalable image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
http://openaccess.thecvf.com/content_cvpr_2018/papers/Zoph_Learning_Transferable_Architectures_CVPR_2018_paper.pdf

// pNASNet
Liu, Chenxi, et al. "Progressive neural architecture search." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
http://openaccess.thecvf.com/content_ECCV_2018/papers/Chenxi_Liu_Progressive_Neural_Architecture_ECCV_2018_paper.pdf 

// AmoebaNet
Real, Esteban, et al. "Regularized evolution for image classifier architecture search." Proceedings of the aaai conference on artificial intelligence. Vol. 33. 2019.
https://arxiv.org/pdf/1802.01548.pdf

// mNASNet
Tan, Mingxing, et al. "Mnasnet: Platform-aware neural architecture search for mobile." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
http://openaccess.thecvf.com/content_CVPR_2019/papers/Tan_MnasNet_Platform-Aware_Neural_Architecture_Search_for_Mobile_CVPR_2019_paper.pdf

# Auto-DeepLab
Liu, Chenxi, et al. "Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
http://openaccess.thecvf.com/content_CVPR_2019/papers/Liu_Auto-DeepLab_Hierarchical_Neural_Architecture_Search_for_Semantic_Image_Segmentation_CVPR_2019_paper.pdf

# NAS-FPN
Ghiasi, Golnaz, Tsung-Yi Lin, and Quoc V. Le. "Nas-fpn: Learning scalable feature pyramid architecture for object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
http://openaccess.thecvf.com/content_CVPR_2019/papers/Ghiasi_NAS-FPN_Learning_Scalable_Feature_Pyramid_Architecture_for_Object_Detection_CVPR_2019_paper.pdf

# AutoAugment
Cubuk, Ekin D., et al. "Autoaugment: Learning augmentation policies from data." arXiv preprint arXiv:1805.09501 (2018).
https://arxiv.org/pdf/1805.09501.pdf

# EfficientNet
Tan, Mingxing, and Quoc V. Le. "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks." arXiv preprint arXiv:1905.11946 (2019).
https://arxiv.org/pdf/1905.11946.pdf

# EffcientDet
Tan, Mingxing, Ruoming Pang, and Quoc V. Le. "Efficientdet: Scalable and efficient object detection." arXiv preprint arXiv:1911.09070 (2019).
https://arxiv.org/pdf/1911.09070.pdf

-----

Semantic Segmentation

-----

// SDS
Hariharan, Bharath, et al. "Simultaneous detection and segmentation." European Conference on Computer Vision. Springer, Cham, 2014.
https://arxiv.org/pdf/1407.1808.pdf

# FCN
Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Long_Fully_Convolutional_Networks_2015_CVPR_paper.pdf

# DeconvNet
Noh, Hyeonwoo, Seunghoon Hong, and Bohyung Han. "Learning deconvolution network for semantic segmentation." Proceedings of the IEEE international conference on computer vision. 2015.
https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Noh_Learning_Deconvolution_Network_ICCV_2015_paper.pdf

# SegNet
Badrinarayanan, Vijay, Alex Kendall, and Roberto Cipolla. "Segnet: A deep convolutional encoder-decoder architecture for image segmentation." IEEE transactions on pattern analysis and machine intelligence 39.12 (2017): 2481-2495.
https://arxiv.org/pdf/1511.00561.pdf

# U-Net
Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015.
https://arxiv.org/pdf/1505.04597.pdf

# U-Net++
Zhou, Zongwei, et al. "Unet++: A nested u-net architecture for medical image segmentation." Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Springer, Cham, 2018. 3-11.
https://arxiv.org/pdf/1807.10165.pdf

-----

# DilatedNet
Yu, Fisher, and Vladlen Koltun. "Multi-scale context aggregation by dilated convolutions." arXiv preprint arXiv:1511.07122 (2015).
https://arxiv.org/pdf/1511.07122.pdf 

# ENet
Paszke, Adam, et al. "Enet: A deep neural network architecture for real-time semantic segmentation." arXiv preprint arXiv:1606.02147 (2016).
https://arxiv.org/pdf/1606.02147.pdf
 
# DRN
Yu, Fisher, Vladlen Koltun, and Thomas Funkhouser. "Dilated residual networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
http://openaccess.thecvf.com/content_cvpr_2017/papers/Yu_Dilated_Residual_Networks_CVPR_2017_paper.pdf

# FastFCN
Wu, Huikai, et al. "FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation." arXiv preprint arXiv:1903.11816 (2019).
https://arxiv.org/pdf/1903.11816.pdf 

-----

# FC-CRF
Krähenbühl, Philipp, and Vladlen Koltun. "Efficient inference in fully connected crfs with gaussian edge potentials." Advances in neural information processing systems. 2011.
http://papers.nips.cc/paper/4296-efficient-inference-in-fully-connected-crfs-with-gaussian-edge-potentials.pdf

# DeepLab v1
Chen, Liang-Chieh, et al. "Semantic image segmentation with deep convolutional nets and fully connected crfs." arXiv preprint arXiv:1412.7062 (2014).
https://arxiv.org/pdf/1412.7062.pdf

# DeepLab v2
Chen, Liang-Chieh, et al. "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs." arXiv preprint arXiv:1606.00915 (2016).
https://arxiv.org/pdf/1606.00915.pdf 

# DeepLab v3
Chen, Liang-Chieh, et al. "Rethinking atrous convolution for semantic image segmentation." arXiv preprint arXiv:1706.05587 (2017).
https://arxiv.org/pdf/1706.05587.pdf  

# DeepLab v3+
Chen, Liang-Chieh, et al. "Encoder-decoder with atrous separable convolution for semantic image segmentation." Proceedings of the European conference on computer vision (ECCV). 2018.
http://openaccess.thecvf.com/content_ECCV_2018/papers/Liang-Chieh_Chen_Encoder-Decoder_with_Atrous_ECCV_2018_paper.pdf 

# Gated-SCNN
Takikawa, Towaki, et al. "Gated-scnn: Gated shape cnns for semantic segmentation." Proceedings of the IEEE International Conference on Computer Vision. 2019.
http://openaccess.thecvf.com/content_ICCV_2019/papers/Takikawa_Gated-SCNN_Gated_Shape_CNNs_for_Semantic_Segmentation_ICCV_2019_paper.pdf

-----

# ResNet-38
Wu, Zifeng, Chunhua Shen, and Anton Van Den Hengel. "Wider or deeper: Revisiting the resnet model for visual recognition." Pattern Recognition 90 (2019): 119-133.
https://arxiv.org/pdf/1611.10080.pdf 

# Tiramisu
Jégou, Simon, et al. "The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2017.
http://openaccess.thecvf.com/content_cvpr_2017_workshops/w13/papers/Jegou_The_One_Hundred_CVPR_2017_paper.pdf

# RefineNet
Lin, Guosheng, et al. "Refinenet: Multi-path refinement networks for high-resolution semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
http://openaccess.thecvf.com/content_cvpr_2017/papers/Lin_RefineNet_Multi-Path_Refinement_CVPR_2017_paper.pdf 

# RefineNet-LW
Nekrasov, Vladimir, Chunhua Shen, and Ian Reid. "Light-weight refinenet for real-time semantic segmentation." arXiv preprint arXiv:1810.03272 (2018).
https://arxiv.org/pdf/1810.03272.pdf

# RefineNet-AA
Nekrasov, Vladimir, et al. "Real-time joint semantic segmentation and depth estimation using asymmetric annotations." 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019.
https://arxiv.org/pdf/1809.04766.pdf

# VPLR
Zhu, Yi, et al. "Improving Semantic Segmentation via Video Propagation and Label Relaxation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhu_Improving_Semantic_Segmentation_via_Video_Propagation_and_Label_Relaxation_CVPR_2019_paper.pdf

# PSPNet
Zhao, Hengshuang, et al. "Pyramid scene parsing network." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
http://openaccess.thecvf.com/content_cvpr_2017/papers/Zhao_Pyramid_Scene_Parsing_CVPR_2017_paper.pdf

# ICNet
Zhao, Hengshuang, et al. "Icnet for real-time semantic segmentation on high-resolution images." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
http://openaccess.thecvf.com/content_ECCV_2018/papers/Hengshuang_Zhao_ICNet_for_Real-Time_ECCV_2018_paper.pdf

# BiSeNet
Yu, Changqian, et al. "Bisenet: Bilateral segmentation network for real-time semantic segmentation." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
http://openaccess.thecvf.com/content_ECCV_2018/papers/Changqian_Yu_BiSeNet_Bilateral_Segmentation_ECCV_2018_paper.pdf

# Fast-SCNN
Poudel, Rudra PK, Stephan Liwicki, and Roberto Cipolla. "Fast-SCNN: fast semantic segmentation network." arXiv preprint arXiv:1902.04502 (2019).
https://arxiv.org/pdf/1902.04502.pdf

# BlitzNet
Dvornik, Nikita, et al. "Blitznet: A real-time deep network for scene understanding." Proceedings of the IEEE international conference on computer vision. 2017.
http://openaccess.thecvf.com/content_ICCV_2017/papers/Dvornik_BlitzNet_A_Real-Time_ICCV_2017_paper.pdf

// SA-GAN

// DANet

// OCNet
  
-----


Instance Segmentation 

-----

// MNC
Dai, Jifeng, Kaiming He, and Jian Sun. "Instance-aware semantic segmentation via multi-task network cascades." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Dai_Instance-Aware_Semantic_Segmentation_CVPR_2016_paper.pdf

// DeepMask
Pinheiro, Pedro O., Ronan Collobert, and Piotr Dollár. "Learning to segment object candidates." Advances in Neural Information Processing Systems. 2015.
https://papers.nips.cc/paper/5852-learning-to-segment-object-candidates.pdf

// SharpMask
Pinheiro, Pedro O., et al. "Learning to refine object segments." European Conference on Computer Vision. Springer, Cham, 2016.
https://arxiv.org/pdf/1603.08695.pdf

// MultiPathNet
Zagoruyko, Sergey, et al. "A multipath network for object detection." arXiv preprint arXiv:1604.02135 (2016).
https://arxiv.org/pdf/1604.02135.pdf

// InstanceFCN
Dai, Jifeng, et al. "Instance-sensitive fully convolutional networks." European Conference on Computer Vision. Springer, Cham, 2016.
https://arxiv.org/pdf/1603.08678.pdf

// FCIS
Li, Yi, et al. "Fully convolutional instance-aware semantic segmentation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.
http://openaccess.thecvf.com/content_cvpr_2017/papers/Li_Fully_Convolutional_Instance-Aware_CVPR_2017_paper.pdf

# Mask R-CNN
He, Kaiming, et al. "Mask r-cnn." Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 2017.
http://openaccess.thecvf.com/content_ICCV_2017/papers/He_Mask_R-CNN_ICCV_2017_paper.pdf

# YOLACT++
Bolya, Daniel, et al. "YOLACT++: Better Real-time Instance Segmentation." arXiv preprint arXiv:1912.06218 (2019).
https://arxiv.org/pdf/1912.06218.pdf

-----

◎ Object Detection

-----

// SVM

// SMO
Platt, John. "Sequential minimal optimization: A fast algorithm for training support vector machines." (1998).
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-98-14.pdf

-----

// SIFT

// HOG
 
// DPM

-----

# DPM
Felzenszwalb, Pedro F., et al. "Object detection with discriminatively trained part-based models." IEEE transactions on pattern analysis and machine intelligence 32.9 (2010): 1627-1645.
https://ttic.uchicago.edu/~dmcallester/lsvm-pami.pdf

# SS
Uijlings, Jasper RR, et al. "Selective search for object recognition." International journal of computer vision 104.2 (2013): 154-171.
https://ivi.fnwi.uva.nl/isis/publications/2013/UijlingsIJCV2013/UijlingsIJCV2013.pdf

# R-CNN
Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2014.
https://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Girshick_Rich_Feature_Hierarchies_2014_CVPR_paper.pdf?spm=5176.100239.blogcont55892.8.pm8zm1&file=Girshick_Rich_Feature_Hierarchies_2014_CVPR_paper.pdf

# SPPNet
He, Kaiming, et al. "Spatial pyramid pooling in deep convolutional networks for visual recognition." european conference on computer vision. Springer, Cham, 2014.
https://arxiv.org/pdf/1406.4729.pdf
 
# Fast R-CNN
Girshick, Ross. "Fast R-CNN." Proceedings of the IEEE international conference on computer vision. 2015.
http://openaccess.thecvf.com/content_iccv_2015/papers/Girshick_Fast_R-CNN_ICCV_2015_paper.pdf

# Faster R-CNN
Ren, Shaoqing, et al. "Faster R-CNN: Towards real-time object detection with region proposal networks." Advances in neural information processing systems. 2015.
http://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks.pdf

-----

# OverFeat
Sermanet, Pierre, et al. "Overfeat: Integrated recognition, localization and detection using convolutional networks." arXiv preprint arXiv:1312.6229 (2013).
https://arxiv.org/pdf/1312.6229.pdf
 
# YOLO v1
Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Redmon_You_Only_Look_CVPR_2016_paper.pdf

# SSD
Liu, Wei, et al. "SSD: Single shot multibox detector." European conference on computer vision. Springer, Cham, 2016.
https://arxiv.org/pdf/1512.02325.pdf

# DSSD
Fu, Cheng-Yang, et al. "Dssd: Deconvolutional single shot detector." arXiv preprint arXiv:1701.06659 (2017).
https://arxiv.org/pdf/1701.06659.pdf

# YOLO v2
Redmon, Joseph, and Ali Farhadi. "YOLO9000: better, faster, stronger." arXiv preprint (2017).

-----

# ION
Bell, Sean, et al. "Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
http://openaccess.thecvf.com/content_cvpr_2016/papers/Bell_Inside-Outside_Net_Detecting_CVPR_2016_paper.pdf

# R-FCN
Dai, Jifeng, et al. "R-fcn: Object detection via region-based fully convolutional networks." Advances in neural information processing systems. 2016.
https://papers.nips.cc/paper/6465-r-fcn-object-detection-via-region-based-fully-convolutional-networks.pdf

# SATO
Huang, Jonathan, et al. "Speed/accuracy trade-offs for modern convolutional object detectors." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
http://openaccess.thecvf.com/content_cvpr_2017/papers/Huang_SpeedAccuracy_Trade-Offs_for_CVPR_2017_paper.pdf

# DCN v1
Dai, Jifeng, et al. "Deformable convolutional networks." Proceedings of the IEEE international conference on computer vision. 2017.
http://openaccess.thecvf.com/content_ICCV_2017/papers/Dai_Deformable_Convolutional_Networks_ICCV_2017_paper.pdf

# DCN v2
Zhu, Xizhou, et al. "Deformable convnets v2: More deformable, better results." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhu_Deformable_ConvNets_V2_More_Deformable_Better_Results_CVPR_2019_paper.pdf

# Cascade R-CNN
Cai, Zhaowei, and Nuno Vasconcelos. "Cascade r-cnn: Delving into high quality object detection." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
http://openaccess.thecvf.com/content_cvpr_2018/papers/Cai_Cascade_R-CNN_Delving_CVPR_2018_paper.pdf   

# FPN
Lin, Tsung-Yi, et al. "Feature pyramid networks for object detection." CVPR. Vol. 1. No. 2. 2017.
http://openaccess.thecvf.com/content_cvpr_2017/papers/Lin_Feature_Pyramid_Networks_CVPR_2017_paper.pdf 

# STDN
Zhou, Peng, et al. "Scale-transferrable object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
http://openaccess.thecvf.com/content_cvpr_2018/papers/Zhou_Scale-Transferrable_Object_Detection_CVPR_2018_paper.pdf

# YOLO v3
YOLOv3: An Incremental Improvement
https://pjreddie.com/media/files/papers/YOLOv3.pdf 

# RON
Kong, Tao, et al. "Ron: Reverse connection with objectness prior networks for object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.
http://openaccess.thecvf.com/content_cvpr_2017/papers/Kong_RON_Reverse_Connection_CVPR_2017_paper.pdf 

# RefineDet
Zhang, Shifeng, et al. "Single-shot refinement neural network for object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
http://openaccess.thecvf.com/content_cvpr_2018/papers/Zhang_Single-Shot_Refinement_Neural_CVPR_2018_paper.pdf
 
# M2Det
Zhao, Qijie, et al. "M2det: A single-shot object detector based on multi-level feature pyramid network." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33. 2019.
https://arxiv.org/pdf/1811.04533.pdf

# SNIP
Singh, Bharat, and Larry S. Davis. "An analysis of scale invariance in object detection snip." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
http://openaccess.thecvf.com/content_cvpr_2018/papers/Singh_An_Analysis_of_CVPR_2018_paper.pdf

# SNIPER
Singh, Bharat, Mahyar Najibi, and Larry S. Davis. "SNIPER: Efficient multi-scale training." Advances in Neural Information Processing Systems. 2018.
http://papers.nips.cc/paper/8143-sniper-efficient-multi-scale-training.pdf

# AutoFocus
Najibi, Mahyar, Bharat Singh, and Larry S. Davis. "Autofocus: Efficient multi-scale inference." Proceedings of the IEEE International Conference on Computer Vision. 2019.
http://openaccess.thecvf.com/content_ICCV_2019/papers/Najibi_AutoFocus_Efficient_Multi-Scale_Inference_ICCV_2019_paper.pdf

# DetNet
Li, Zeming, et al. "Detnet: Design backbone for object detection." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
http://openaccess.thecvf.com/content_ECCV_2018/papers/Zeming_Li_DetNet_Design_Backbone_ECCV_2018_paper.pdf

# TridentNet
Li, Yanghao, et al. "Scale-aware trident networks for object detection." arXiv preprint arXiv:1901.01892 (2019).
https://arxiv.org/pdf/1901.01892.pdf

-----

# OHEM
Shrivastava, Abhinav, Abhinav Gupta, and Ross Girshick. "Training region-based object detectors with online hard example mining." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Shrivastava_Training_Region-Based_Object_CVPR_2016_paper.pdf

# RetinaNet(Focal Loss)
Lin, Tsung-Yi, et al. "Focal loss for dense object detection." IEEE transactions on pattern analysis and machine intelligence (2018).
https://vision.cornell.edu/se3/wp-content/uploads/2017/09/focal_loss.pdf
https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8417976

# GHM
Li, Buyu, Yu Liu, and Xiaogang Wang. "Gradient harmonized single-stage detector." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33. 2019.
https://arxiv.org/pdf/1811.05181.pdf

# Libra R-CNN
Pang, Jiangmiao, et al. "Libra r-cnn: Towards balanced learning for object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
http://openaccess.thecvf.com/content_CVPR_2019/papers/Pang_Libra_R-CNN_Towards_Balanced_Learning_for_Object_Detection_CVPR_2019_paper.pdf

# DCR v1
Cheng, Bowen, et al. "Revisiting rcnn: On awakening the classification power of faster rcnn." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
http://openaccess.thecvf.com/content_ECCV_2018/papers/Bowen_Cheng_Revisiting_RCNN_On_ECCV_2018_paper.pdf
 
# DCR v2
Cheng, Bowen, et al. "Decoupled classification refinement: Hard false positive suppression for object detection." arXiv preprint arXiv:1810.04002 (2018).
https://arxiv.org/pdf/1810.04002.pdf

# PISA
Cao, Yuhang, et al. "Prime Sample Attention in Object Detection." arXiv preprint arXiv:1904.04821 (2019).
https://arxiv.org/pdf/1904.04821.pdf

-----

// CornerNet
Law, Hei, and Jia Deng. "Cornernet: Detecting objects as paired keypoints." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
http://openaccess.thecvf.com/content_ECCV_2018/papers/Hei_Law_CornerNet_Detecting_Objects_ECCV_2018_paper.pdf
 
// CenterNet
Duan, Kaiwen, et al. "CenterNet: Object Detection with Keypoint Triplets." arXiv preprint arXiv:1904.08189 (2019).
https://arxiv.org/pdf/1904.08189.pdf
 
// SelectNet
Liu, Yunru, Tingran Gao, and Haizhao Yang. "SelectNet: Learning to Sample from the Wild for Imbalanced Data Training." arXiv preprint arXiv:1905.09872 (2019).
https://arxiv.org/pdf/1905.09872.pdf

// Bottom-up
Zhou, Xingyi, Jiacheng Zhuo, and Philipp Krahenbuhl. "Bottom-up object detection by grouping extreme and center points." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhou_Bottom-Up_Object_Detection_by_Grouping_Extreme_and_Center_Points_CVPR_2019_paper.pdf 

-----

◎ Face Detection

-----


-----

◎ Face Recognition

-----

// DeepFace

// DeepID

// VGGFace

-----

// FaceNet
Schroff, Florian, Dmitry Kalenichenko, and James Philbin. "Facenet: A unified embedding for face recognition and clustering." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Schroff_FaceNet_A_Unified_2015_CVPR_paper.pdf

// LMNN
Weinberger, Kilian Q., John Blitzer, and Lawrence K. Saul. "Distance metric learning for large margin nearest neighbor classification." Advances in neural information processing systems. 2006.
http://papers.nips.cc/paper/2795-distance-metric-learning-for-large-margin-nearest-neighbor-classification.pdf

-----

// Center Loss

// Sphere Face(A-softmax)

// CosFace(AM-softmax)

// ArcFace

// MobileID

// MobileFace

// OpenFace

// SeetaFace

-----

Visual Tracking  

-----

Part II - Natural Language Processing

-----

LSTM

-----

// RNN(Recurrent Neural Network)
Elman, Jeffrey L. "Finding structure in time." Cognitive science 14.2 (1990): 179-211.
http://www2.fiit.stuba.sk/~kvasnicka/NeuralNetworks/6.prednaska/Elman_SRNN_paper.pdf

# LSTM(Long Short-Term Memory)
Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9.8 (1997): 1735-1780.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.676.4320&rep=rep1&type=pdf

// BRNN(Bidirectional RNN)
Schuster, Mike, and Kuldip K. Paliwal. "Bidirectional recurrent neural networks." IEEE Transactions on Signal Processing 45.11 (1997): 2673-2681.
http://www.cs.cmu.edu/afs/cs/user/bhiksha/WWW/courses/deeplearning/Fall.2016/pdfs/Schuster97_BRNN.pdf

// BLSTM(Bidirectional LSTM)
Graves, Alex, Navdeep Jaitly, and Abdel-rahman Mohamed. "Hybrid speech recognition with deep bidirectional LSTM." Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on. IEEE, 2013.
http://www.cs.toronto.edu/~graves/asru_2013.pdf

# GRU(Gated Recurrent Unit)
Cho, Kyunghyun, et al. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." arXiv preprint arXiv:1406.1078 (2014).
https://arxiv.org/pdf/1406.1078.pdf
 
// MGU(Minimal Gated Unit)
Zhou, Guo-Bing, et al. "Minimal gated unit for recurrent neural networks." International Journal of Automation and Computing 13.3 (2016): 226-234.
https://arxiv.org/pdf/1603.09420.pdf

// SRU(Simple Recurrent Unit)
Lei, Tao, et al. "Simple recurrent units for highly parallelizable recurrence." Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018.
https://arxiv.org/pdf/1709.02755.pdf

// Comparison of LSTM, GRU, MGU, and SRU
Hou, Bo-Jian, and Zhi-Hua Zhou. "Learning with Interpretable Structure from RNN." arXiv preprint arXiv:1810.10708 (2018).
https://arxiv.org/pdf/1810.10708.pdf
 
-----

Seq2seq

-----

# Seq2seq - using LSTM
Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." Advances in neural information processing systems. 2014.
http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf 

-----

# GloVe
Pennington, Jeffrey, Richard Socher, and Christopher Manning. "Glove: Global vectors for word representation." Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014.
https://www.aclweb.org/anthology/D14-1162

# fastText
Joulin, Armand, et al. "Bag of tricks for efficient text classification." arXiv preprint arXiv:1607.01759 (2016).
https://arxiv.org/pdf/1607.01759.pdf

# Skip-Thought
Kiros, Ryan, et al. "Skip-thought vectors." Advances in neural information processing systems. 2015.
https://papers.nips.cc/paper/5950-skip-thought-vectors.pdf

# Quick-Thought
Logeswaran, Lajanugen, and Honglak Lee. "An efficient framework for learning sentence representations." arXiv preprint arXiv:1803.02893 (2018).
https://arxiv.org/pdf/1803.02893.pdf

# InferSent
Conneau, Alexis, et al. "Supervised learning of universal sentence representations from natural language inference data." arXiv preprint arXiv:1705.02364 (2017).
https://arxiv.org/pdf/1705.02364.pdf


-----

Attention

-----

# Attention - using GRU
Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. "Neural machine translation by jointly learning to align and translate." arXiv preprint arXiv:1409.0473 (2014).
https://arxiv.org/pdf/1409.0473.pdf

# Attention - using LSTM
Luong, Minh-Thang, Hieu Pham, and Christopher D. Manning. "Effective approaches to attention-based neural machine translation." arXiv preprint arXiv:1508.04025 (2015).
https://arxiv.org/pdf/1508.04025.pdf 

# Visual Attention
Xu, Kelvin, et al. "Show, attend and tell: Neural image caption generation with visual attention." International conference on machine learning. 2015.
http://proceedings.mlr.press/v37/xuc15.pdf 

// FSA
Daniluk, Michał, et al. "Frustratingly short attention spans in neural language modeling." arXiv preprint arXiv:1702.04521 (2017).
https://arxiv.org/pdf/1702.04521.pdf

// MHA
Iida, Shohei, et al. "A Multi-Hop Attention for RNN based Neural Machine Translation." Proceedings of The 8th Workshop on Patent and Scientific Literature Translation. 2019.
https://www.aclweb.org/anthology/W19-7203

// AOH
Iida, Shohei, et al. "Attention over Heads: A Multi-Hop Attention for Neural Machine Translation." Proceedings of the 57th Conference of the Association for Computational Linguistics: Student Research Workshop. 2019.
https://www.aclweb.org/anthology/P19-2030

-----

NTM & Memory

-----

# NTM
Graves, Alex, Greg Wayne, and Ivo Danihelka. "Neural turing machines." arXiv preprint arXiv:1410.5401 (2014).
https://arxiv.org/pdf/1410.5401.pdf

// MN
J Weston, S Chopra, and A Bordes. Memory networks. ICLR, 2014.
https://arxiv.org/abs/1410.3916

// EEMN
Sukhbaatar, Sainbayar, Jason Weston, and Rob Fergus. "End-to-end memory networks." Advances in neural information processing systems. 2015.
https://papers.nips.cc/paper/5846-end-to-end-memory-networks.pdf

# KVMN
Miller, Alexander, et al. "Key-value memory networks for directly reading documents." arXiv preprint arXiv:1606.03126 (2016).
https://arxiv.org/pdf/1606.03126.pdf

// PN
Vinyals, Oriol, Meire Fortunato, and Navdeep Jaitly. "Pointer networks." Advances in Neural Information Processing Systems. 2015.
http://papers.nips.cc/paper/5866-pointer-networks.pdf

// Set2set
Vinyals, Oriol, Samy Bengio, and Manjunath Kudlur. "Order matters: Sequence to sequence for sets." arXiv preprint arXiv:1511.06391 (2015).
https://arxiv.org/pdf/1511.06391.pdf

// RL NTM
Zaremba, Wojciech, and Ilya Sutskever. "Reinforcement learning neural turing machines-revised." arXiv preprint arXiv:1505.00521 (2015).
https://arxiv.org/pdf/1505.00521.pdf

// Hybrid Computing
Graves, Alex, et al. "Hybrid computing using a neural network with dynamic external memory." Nature 538.7626 (2016): 471.
https://campus.swarma.org/public/ueditor/php/upload/file/20170609/1497019302822809.pdf

// Implementing NTM
Collier, Mark, and Joeran Beel. "Implementing Neural Turing Machines." International Conference on Artificial Neural Networks. Springer, Cham, 2018.
https://arxiv.org/pdf/1807.08518.pdf

-----

ConvS2S

-----

// GLU
Dauphin, Yann N., et al. "Language modeling with gated convolutional networks." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017.
https://arxiv.org/pdf/1612.08083.pdf

# ConvS2S
Gehring, Jonas, et al. "Convolutional sequence to sequence learning." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017.
https://arxiv.org/pdf/1705.03122.pdf

-----

# NNLM
Bengio, Yoshua, et al. "A neural probabilistic language model." Journal of machine learning research 3.Feb (2003): 1137-1155.
http://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf 

# Word2vec
Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 (2013).
https://arxiv.org/pdf/1301.3781.pdf

# Skip-gram
Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems. 2013.
https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf

// Doc2vec
Le, Quoc, and Tomas Mikolov. "Distributed representations of sentences and documents." International conference on machine learning. 2014.
http://proceedings.mlr.press/v32/le14.pdf

// Word2vec Explained
Goldberg, Yoav, and Omer Levy. "word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method." arXiv preprint arXiv:1402.3722 (2014).
https://arxiv.org/pdf/1402.3722.pdf

// Word2vec Learning
Rong, Xin. "word2vec parameter learning explained." arXiv preprint arXiv:1411.2738 (2014).
https://arxiv.org/pdf/1411.2738.pdf

-----

# ULMFiT
Howard, Jeremy, and Sebastian Ruder. "Universal language model fine-tuning for text classification." arXiv preprint arXiv:1801.06146 (2018).
https://arxiv.org/pdf/1801.06146.pdf

# ELMo
Peters, Matthew E., et al. "Deep contextualized word representations." arXiv preprint arXiv:1802.05365 (2018).
https://arxiv.org/pdf/1802.05365.pdf 

-----

Transformer

-----

# Transformer
Vaswani, Ashish, et al. "Attention is all you need." Advances in Neural Information Processing Systems. 2017.
https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

# GPT-1
Radford, Alec, et al. "Improving language understanding by generative pre-training." URL https://s3-us-west-2. amazonaws. com/openai-assets/research-covers/languageunsupervised/language understanding paper. pdf (2018).
https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf 

# BERT
Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).
https://arxiv.org/pdf/1810.04805.pdf

// MTL 0
Baxter, Jonathan. "A model of inductive bias learning." Journal of artificial intelligence research 12 (2000): 149-198.
https://arxiv.org/pdf/1106.0245.pdf
  
// MTL 1
Collobert, Ronan, and Jason Weston. "A unified architecture for natural language processing: Deep neural networks with multitask learning." Proceedings of the 25th international conference on Machine learning. ACM, 2008.
http://www.thespermwhale.com/jaseweston/papers/unified_nlp.pdf

// MTL 2
Collobert, Ronan, et al. "Natural language processing (almost) from scratch." Journal of machine learning research 12.Aug (2011): 2493-2537.
http://www.jmlr.org/papers/volume12/collobert11a/collobert11a.pdf

// MTL3
Ruder, Sebastian. "An overview of multi-task learning in deep neural networks." arXiv preprint arXiv:1706.05098 (2017).
https://arxiv.org/pdf/1706.05098.pdf

-----
 
// Universal Transformers
Dehghani, Mostafa, et al. "Universal transformers." arXiv preprint arXiv:1807.03819 (2018).
https://arxiv.org/pdf/1807.03819.pdf

// Transformer XL
Dai, Zihang, et al. "Transformer-xl: Attentive language models beyond a fixed-length context." arXiv preprint arXiv:1901.02860 (2019).
https://arxiv.org/pdf/1901.02860.pdf
 
// MT-DNN
Liu, Xiaodong, et al. "Multi-Task Deep Neural Networks for Natural Language Understanding." arXiv preprint arXiv:1901.11504 (2019).
https://arxiv.org/pdf/1901.11504.pdf

// GPT-2
Vig, Jesse. "Visualizing Attention in Transformer-Based Language models." arXiv preprint arXiv:1904.02679 (2019).
https://arxiv.org/pdf/1904.02679.pdf

// ERNIE Baidu
Sun, Yu, et al. "ERNIE: Enhanced Representation through Knowledge Integration." arXiv preprint arXiv:1904.09223 (2019).
https://arxiv.org/pdf/1904.09223.pdf

// ERNIE THU
Zhang, Zhengyan, et al. "ERNIE: Enhanced Language Representation with Informative Entities." arXiv preprint arXiv:1905.07129 (2019).
https://arxiv.org/pdf/1905.07129.pdf

// XLMs Facebook
Lample, Guillaume, and Alexis Conneau. "Cross-lingual Language Model Pretraining." arXiv preprint arXiv:1901.07291 (2019).
https://arxiv.org/pdf/1901.07291.pdf

// LASER Facebook
Artetxe, Mikel, and Holger Schwenk. "Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond." arXiv preprint arXiv:1812.10464 (2018).
https://arxiv.org/pdf/1812.10464.pdf

// MASS Microsoft
Song, Kaitao, et al. "Mass: Masked sequence to sequence pre-training for language generation." arXiv preprint arXiv:1905.02450 (2019).
https://arxiv.org/pdf/1905.02450.pdf

// UNILM Microsoft
Dong, Li, et al. "Unified Language Model Pre-training for Natural Language Understanding and Generation." arXiv preprint arXiv:1905.03197 (2019).
https://arxiv.org/pdf/1905.03197.pdf

// ON-LSTM
Shen, Yikang, et al. "Ordered neurons: Integrating tree structures into recurrent neural networks." arXiv preprint arXiv:1810.09536 (2018).
https://arxiv.org/pdf/1810.09536.pdf

// XLNet
Yang, Zhilin, et al. "XLNet: Generalized Autoregressive Pretraining for Language Understanding." arXiv preprint arXiv:1906.08237 (2019).
https://arxiv.org/pdf/1906.08237.pdf

-----

Part III:Fundamental Topics

-----

Regularization

-----

# Weight Decay
Zhang, Guodong, et al. "Three mechanisms of weight decay regularization." arXiv preprint arXiv:1810.12281 (2018).
https://arxiv.org/pdf/1810.12281.pdf

// WD 1989
Hanson, Stephen José, and Lorien Y. Pratt. "Comparing biases for minimal network construction with back-propagation." Advances in neural information processing systems. 1989.
http://papers.nips.cc/paper/156-comparing-biases-for-minimal-network-construction-with-back-propagation.pdf

// WD 1992
Krogh, Anders, and John A. Hertz. "A simple weight decay can improve generalization." Advances in neural information processing systems. 1992.
http://papers.nips.cc/paper/563-a-simple-weight-decay-can-improve-generalization.pdf

# L2
# Ridge Regression
Hoerl, Arthur E., and Robert W. Kennard. "Ridge regression: Biased estimation for nonorthogonal problems." Technometrics 12.1 (1970): 55-67.
https://amstat.tandfonline.com/doi/pdf/10.1080/00401706.1970.10488634

# L1
# Lasso Regression
Tibshirani, Robert. "Regression shrinkage and selection via the lasso." Journal of the Royal Statistical Society: Series B (Methodological) 58.1 (1996): 267-288.
http://www.stat.ucla.edu/~sczhu/courses/ucla/stat_232b/chapters/LASSO.pdf

# L0
Louizos, Christos, Max Welling, and Diederik P. Kingma. "Learning Sparse Neural Networks through $ L_0 $ Regularization." arXiv preprint arXiv:1712.01312 (2017).
https://arxiv.org/pdf/1712.01312.pdf

-----

# Dropout
Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." The Journal of Machine Learning Research 15.1 (2014): 1929-1958.
http://www.jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf

# DropConnect
Wan, Li, et al. "Regularization of neural networks using dropconnect." International conference on machine learning. 2013.
http://proceedings.mlr.press/v28/wan13.pdf 

# FractalNet(DropPath)
Larsson, Gustav, Michael Maire, and Gregory Shakhnarovich. "Fractalnet: Ultra-deep neural networks without residuals." arXiv preprint arXiv:1605.07648 (2016).
https://arxiv.org/pdf/1605.07648.pdf

# NASNet(Scheduled DropPath)
Zoph, Barret, et al. "Learning transferable architectures for scalable image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
http://openaccess.thecvf.com/content_cvpr_2018/papers/Zoph_Learning_Transferable_Architectures_CVPR_2018_paper.pdf

# Shake-Shake
Gastaldi, Xavier. "Shake-shake regularization." arXiv preprint arXiv:1705.07485 (2017).
https://arxiv.org/pdf/1705.07485.pdf

# ShakeDrop
Yamada, Yoshihiro, et al. "Shakedrop regularization for deep residual learning." arXiv preprint arXiv:1802.02375 (2018).
https://arxiv.org/pdf/1802.02375.pdf

# Spatial Dropout
Tompson, Jonathan, et al. "Efficient object localization using convolutional networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Tompson_Efficient_Object_Localization_2015_CVPR_paper.pdf

# Cutout
DeVries, Terrance, and Graham W. Taylor. "Improved regularization of convolutional neural networks with cutout." arXiv preprint arXiv:1708.04552 (2017).
https://arxiv.org/pdf/1708.04552.pdf

# DropBlock
Ghiasi, Golnaz, Tsung-Yi Lin, and Quoc V. Le. "Dropblock: A regularization method for convolutional networks." Advances in Neural Information Processing Systems. 2018.
http://papers.nips.cc/paper/8271-dropblock-a-regularization-method-for-convolutional-networks.pdf

-----

# Fast Dropout
Bayer, Justin, et al. "On fast dropout and its applicability to recurrent networks." arXiv preprint arXiv:1311.0701 (2013).
https://arxiv.org/pdf/1311.0701.pdf

# RNN Regularization
Zaremba, Wojciech, Ilya Sutskever, and Oriol Vinyals. "Recurrent neural network regularization." arXiv preprint arXiv:1409.2329 (2014).
https://arxiv.org/pdf/1409.2329.pdf

# Variational Dropout
Kingma, Durk P., Tim Salimans, and Max Welling. "Variational dropout and the local reparameterization trick." Advances in Neural Information Processing Systems. 2015.
https://papers.nips.cc/paper/5666-variational-dropout-and-the-local-reparameterization-trick.pdf

# Information Dropout
Achille, Alessandro, and Stefano Soatto. "Information dropout: Learning optimal representations through noisy computation." IEEE transactions on pattern analysis and machine intelligence 40.12 (2018): 2897-2905.
http://www.vision.jhu.edu/teaching/learning/deeplearning18/assets/Achille_Soatto-18.pdf

# rnnDrop
Moon, Taesup, et al. "Rnndrop: A novel dropout for rnns in asr." 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). IEEE, 2015.
http://mind.skku.edu/files/Conference/asru2015.pdf

# DropEmbbeding
Gal, Yarin, and Zoubin Ghahramani. "A theoretically grounded application of dropout in recurrent neural networks." Advances in neural information processing systems. 2016.
https://papers.nips.cc/paper/6241-a-theoretically-grounded-application-of-dropout-in-recurrent-neural-networks.pdf
 
# Recurrent Dropout
Semeniuta, Stanislau, Aliaksei Severyn, and Erhardt Barth. "Recurrent dropout without memory loss." arXiv preprint arXiv:1603.05118 (2016).
https://arxiv.org/pdf/1603.05118.pdf

# Zoneout
Krueger, David, et al. "Zoneout: Regularizing rnns by randomly preserving hidden activations." arXiv preprint arXiv:1606.01305 (2016).
https://arxiv.org/pdf/1606.01305.pdf 

# AWD-LSTM
Merity, Stephen, Nitish Shirish Keskar, and Richard Socher. "Regularizing and optimizing LSTM language models." arXiv preprint arXiv:1708.02182 (2017).
https://arxiv.org/pdf/1708.02182.pdf

-----

# DropAttention
Zehui, Lin, et al. "DropAttention: A Regularization Method for Fully-Connected Self-Attention Networks." arXiv preprint arXiv:1907.11065 (2019).
https://arxiv.org/pdf/1907.11065.pdf

-----

# Pairing Samples
Inoue, Hiroshi. "Data augmentation by pairing samples for images classification." arXiv preprint arXiv:1801.02929 (2018).
https://arxiv.org/pdf/1801.02929.pdf

# Mixup
Zhang, Hongyi, et al. "mixup: Beyond empirical risk minimization." arXiv preprint arXiv:1710.09412 (2017).
https://arxiv.org/pdf/1710.09412.pdf

-----

Normalization 

-----

# BN
Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." International conference on machine learning. 2015.
http://proceedings.mlr.press/v37/ioffe15.pdf

# WN
Salimans, Tim, and Durk P. Kingma. "Weight normalization: A simple reparameterization to accelerate training of deep neural networks." Advances in Neural Information Processing Systems. 2016.
https://papers.nips.cc/paper/6114-weight-normalization-a-simple-reparameterization-to-accelerate-training-of-deep-neural-networks.pdf
 
# LN
Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. "Layer normalization." arXiv preprint arXiv:1607.06450 (2016).
https://arxiv.org/pdf/1607.06450.pdf

# IN
Ulyanov, Dmitry, Andrea Vedaldi, and Victor Lempitsky. "Instance normalization: The missing ingredient for fast stylization." arXiv preprint arXiv:1607.08022 (2016).
https://arxiv.org/pdf/1607.08022.pdf 

# AIN
Huang, Xun, and Serge Belongie. "Arbitrary style transfer in real-time with adaptive instance normalization." Proceedings of the IEEE International Conference on Computer Vision. 2017.
http://openaccess.thecvf.com/content_ICCV_2017/papers/Huang_Arbitrary_Style_Transfer_ICCV_2017_paper.pdf

# GN
Wu, Yuxin, and Kaiming He. "Group normalization." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
http://openaccess.thecvf.com/content_ECCV_2018/papers/Yuxin_Wu_Group_Normalization_ECCV_2018_paper.pdf

# PN
Li, Boyi, et al. "Positional Normalization." Advances in Neural Information Processing Systems. 2019.
http://papers.nips.cc/paper/8440-positional-normalization.pdf 

# UBN
Bjorck, Nils, et al. "Understanding batch normalization." Advances in Neural Information Processing Systems. 2018.
http://papers.nips.cc/paper/7996-understanding-batch-normalization.pdf

# TUBN
Kohler, Jonas, et al. "Towards a theoretical understanding of batch normalization." stat 1050 (2018): 27.
https://arxiv.org/pdf/1805.10694.pdf

# BNHO
Santurkar, Shibani, et al. "How does batch normalization help optimization?." Advances in Neural Information Processing Systems. 2018.
https://papers.nips.cc/paper/7515-how-does-batch-normalization-help-optimization.pdf

# URBN
Luo, Ping, et al. "Understanding regularization in batch normalization." arXiv preprint arXiv:1809.00846 (2018).
https://arxiv.org/pdf/1809.00846.pdf

# NormProp
Arpit, Devansh, et al. "Normalization propagation: A parametric technique for removing internal covariate shift in deep networks." arXiv preprint arXiv:1603.01431 (2016).
https://arxiv.org/pdf/1603.01431.pdf

# Efficient Backprop
LeCun, Yann A., et al. "Efficient backprop." Neural networks: Tricks of the trade. Springer, Berlin, Heidelberg, 2012. 9-48.
http://cseweb.ucsd.edu/classes/wi08/cse253/Handouts/lecun-98b.pdf

# Whitening
Kessy, Agnan, Alex Lewin, and Korbinian Strimmer. "Optimal whitening and decorrelation." The American Statistician 72.4 (2018): 309-314.
https://arxiv.org/pdf/1512.00809.pdf

# CAT
Zuber, Verena, and Korbinian Strimmer. "Gene ranking and biomarker discovery under correlation." Bioinformatics 25.20 (2009): 2700-2707.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.247.8982&rep=rep1&type=pdf

# CAR
Zuber, Verena, and Korbinian Strimmer. "High-dimensional regression and variable selection using CAR scores." Statistical Applications in Genetics and Molecular Biology 10.1 (2011).
https://arxiv.org/pdf/1007.5516.pdf

# GWNN
Luo, Ping. "Learning deep architectures via generalized whitened neural networks." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017.
http://proceedings.mlr.press/v70/luo17a/luo17a.pdf

# DBN
Huang, Lei, et al. "Decorrelated batch normalization." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
http://openaccess.thecvf.com/content_cvpr_2018/papers/Huang_Decorrelated_Batch_Normalization_CVPR_2018_paper.pdf

# KN
Wang, Guangrun, et al. "Kalman normalization: Normalizing internal representations across network layers." Advances in Neural Information Processing Systems. 2018.
https://papers.nips.cc/paper/7288-kalman-normalization-normalizing-internal-representations-across-network-layers.pdf

# IterNorm
Huang, Lei, et al. "Iterative Normalization: Beyond Standardization towards Efficient Whitening." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
http://openaccess.thecvf.com/content_CVPR_2019/papers/Huang_Iterative_Normalization_Beyond_Standardization_Towards_Efficient_Whitening_CVPR_2019_paper.pdf


-----

Optimization

-----

# SGD
Bottou, Léon. "Stochastic gradient descent tricks." Neural networks: Tricks of the trade. Springer, Berlin, Heidelberg, 2012. 421-436.
https://www.microsoft.com/en-us/research/wp-content/uploads/2012/01/tricks-2012.pdf 

# Momentum
Sutskever, Ilya, et al. "On the importance of initialization and momentum in deep learning." International conference on machine learning. 2013.
http://proceedings.mlr.press/v28/sutskever13.pdf

# NAG
Nesterov, Y. "A method of solving a convex programming problem with convergence rate $$ O (\frac {1}{k^ 2}) $$ O (1k2)." Soviet Math. Dokl. Vol. 27.
http://mpawankumar.info/teaching/cdt-big-data/nesterov83.pdf
 
# AdaGrad
Duchi, John, Elad Hazan, and Yoram Singer. "Adaptive subgradient methods for online learning and stochastic optimization." Journal of Machine Learning Research 12.Jul (2011): 2121-2159.
http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf
  
# AdaDelta
Zeiler, Matthew D. "ADADELTA: an adaptive learning rate method." arXiv preprint arXiv:1212.5701 (2012).
https://arxiv.org/pdf/1212.5701.pdf

# RMSProp
Tieleman, Tijmen, and Geoffrey Hinton. "Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude." COURSERA: Neural networks for machine learning 4.2 (2012): 26-31.
http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf

# Adam
Kingma, Diederik P., and Jimmy Ba. "Adam: A method for stochastic optimization." arXiv preprint arXiv:1412.6980 (2014).
https://arxiv.org/pdf/1412.6980.pdf

# From Adam to SGD
Keskar, Nitish Shirish, and Richard Socher. "Improving generalization performance by switching from adam to sgd." arXiv preprint arXiv:1712.07628 (2017).
https://arxiv.org/pdf/1712.07628.pdf

# Nadam
Dozat, Timothy. "Incorporating nesterov momentum into adam." (2016).
https://openreview.net/pdf?id=OM0jvwB8jIp57ZJjtNEZ

# AMSGrad
Reddi, Sashank J., Satyen Kale, and Sanjiv Kumar. "On the convergence of adam and beyond." International Conference on Learning Representations. 2018.
http://www.satyenkale.com/papers/amsgrad.pdf 

# RAdam
Liu, Liyuan, et al. "On the variance of the adaptive learning rate and beyond." arXiv preprint arXiv:1908.03265 (2019).
https://arxiv.org/pdf/1908.03265.pdf

# SMA
Nau, Robert. "Forecasting with moving averages." Fuqua School of Business, Duke University (2014): 1-3.
https://people.duke.edu/~rnau/Notes_on_forecasting_with_moving_averages--Robert_Nau.pdf

# Lookahead
Zhang, Michael, et al. "Lookahead Optimizer: k steps forward, 1 step back." Advances in Neural Information Processing Systems. 2019.
http://papers.nips.cc/paper/9155-lookahead-optimizer-k-steps-forward-1-step-back.pdf

# EMA
Hunter, J. Stuart. "The exponentially weighted moving average." Journal of quality technology 18.4 (1986): 203-210.
https://www.researchgate.net/profile/Arumugam_Raman/post/What_kind_of_data_is_usually_considered_in_Construction_of_Shewhart_control_charts/attachment/59d6255579197b8077983a73/AS%3A273836358995969%401442299083585/download/L11-OnEWMA.pdf

# LAMB
You, Yang, et al. "Reducing BERT Pre-Training Time from 3 Days to 76 Minutes." arXiv preprint arXiv:1904.00962 (2019).
https://arxiv.org/pdf/1904.00962.pdf

# CLR
Smith, Leslie N. "Cyclical learning rates for training neural networks." 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2017.
https://arxiv.org/pdf/1506.01186.pdf

# SGDR
Loshchilov, Ilya, and Frank Hutter. "Sgdr: Stochastic gradient descent with warm restarts." arXiv preprint arXiv:1608.03983 (2016).
https://arxiv.org/pdf/1608.03983.pdf

# AdamW
Loshchilov, Ilya, and Frank Hutter. "Decoupled weight decay regularization (2019)." arXiv preprint arXiv:1711.05101.
https://arxiv.org/pdf/1711.05101.pdf

# Super-Convergence
Smith, Leslie N., and Nicholay Topin. "Super-convergence: Very fast training of residual networks using large learning rates." (2018).
https://openreview.net/pdf?id=H1A5ztj3b

# ADMM
Boyd, Stephen, et al. "Distributed optimization and statistical learning via the alternating direction method of multipliers." Foundations and Trends® in Machine learning 3.1 (2011): 1-122.
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.360.1664&rep=rep1&type=pdf

# ADMM-S
Taylor, Gavin, et al. "Training neural networks without gradients: A scalable admm approach." International conference on machine learning. 2016.
http://proceedings.mlr.press/v48/taylor16.pdf

# dlADMM
Wang, Junxiang, et al. "ADMM for Efficient Deep Learning with Global Convergence." arXiv preprint arXiv:1905.13611 (2019).
https://arxiv.org/pdf/1905.13611.pdf
 
-----

Activation Function

-----

# Activation Function
Nwankpa, Chigozie, et al. "Activation functions: Comparison of trends in practice and research for deep learning." arXiv preprint arXiv:1811.03378 (2018).
https://arxiv.org/pdf/1811.03378.pdf

# ReLU 2000


# ReLU 2009


# Softplus


# LReLU


# PReLU


# ELU


# SELU


# GELU
 

# Swish
Ramachandran, Prajit, Barret Zoph, and Quoc V. Le. "Searching for activation functions." arXiv preprint arXiv:1710.05941 (2017).
https://arxiv.org/pdf/1710.05941.pdf

# Maxout
Goodfellow, Ian J., et al. "Maxout networks." arXiv preprint arXiv:1302.4389 (2013).
http://proceedings.mlr.press/v28/goodfellow13.pdf

-----

Loss Function

-----

# Loss Function
Barron, Jonathan T. "A general and adaptive robust loss function." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
http://openaccess.thecvf.com/content_CVPR_2019/papers/Barron_A_General_and_Adaptive_Robust_Loss_Function_CVPR_2019_paper.pdf

-----

Pooling

-----

-----

Convolution

-----

-----

Automatic Differentiation

-----

-----

Back Propagation

-----


-----

Back Propagation

-----

# Back Propagation
Alber, Maximilian, et al. "Backprop evolution." arXiv preprint arXiv:1808.02822 (2018).
https://arxiv.org/pdf/1808.02822.pdf

-----

-----

Computational Graph

-----


Part IV

-----

Medicine

-----

// Heart failure
Golas, Sara Bersche, et al. "A machine learning model to predict the risk of 30-day readmissions in patients with heart failure: a retrospective analysis of electronic medical records data." BMC medical informatics and decision making 18.1 (2018): 44.
https://bmcmedinformdecismak.biomedcentral.com/track/pdf/10.1186/s12911-018-0620-z

// Urgent care
Zebin, Tahmina, and Thierry J. Chaussalet. "Design and implementation of a deep recurrent model for prediction of readmission in urgent care using electronic health records." 2019 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB). IEEE, 2019.
https://ueaeprints.uea.ac.uk/71957/1/PID5991805_readmission.pdf

Ashfaq, Awais, et al. "Readmission prediction using deep learning on electronic health records." Journal of biomedical informatics 97 (2019): 103256.
https://www.sciencedirect.com/science/article/pii/S1532046419301753

Rajkomar, Alvin, et al. "Scalable and accurate deep learning with electronic health records." NPJ Digital Medicine 1.1 (2018): 18.
https://www.ehidc.org/sites/default/files/resources/files/electronic%20health%20records.pdf
 
-----

References

# 參考書籍
[1] 書名:深度學習: Caffe 之經典模型詳解與實戰,ISBN:7121301180,作者:樂毅,出版社:電子工業,出版日期:2016-09-30.

# LSTM 論文的輔助教材
[2] Understanding LSTM Networks -- colah's blog
http://colah.github.io/posts/2015-08-Understanding-LSTMs/ 

# 三篇快速版(2017春)
[3] Deep Learning Paper
http://hemingwang.blogspot.com/2019/01/deep-learning-paper.html

# 二十篇慢速版(2018春)
[4] PyTorch(六):Seminar
http://hemingwang.blogspot.com/2018/01/pytorchseminar.html

# 三十篇基礎版(2019春)
[5] 30 Topics for Deep Learning
http://hemingwang.blogspot.com/2019/04/30-topics-for-deep-learning.html  

# 十篇精華版(2019夏)
[6] AI 三部曲(深度學習:從入門到精通)
https://hemingwang.blogspot.com/2019/05/trilogy.html

# 五十篇完整版(2019秋)
[7] AI從頭學(三九):Complete Works
http://hemingwang.blogspot.tw/2017/08/aicomplete-works.html