Sunday, August 30, 2020

Deep Learning Highlight

Deep Learning Highlight

2019/04/25

說明:

這是依照我自學深度學習進度推出的入門建議。

分別有:三篇快速版,可以「快速」一窺深度學習全貌。二十篇慢速版,以電腦視覺為主。三十篇基礎版,新增基礎主題與自然語言處理。十篇精華版,為三十篇基礎版的精華。五十篇完整版,涵蓋上述所有提到的論文。

-----


Fig. 1. 深度學習: Caffe 之經典模型詳解與實戰 [1]。

-----

三篇快速版,給想對深度學習有基本認識的一般大眾。

1. Deep Learning
2. LeNet
3. LSTM

-----

二十篇慢速版,PyTorch Taipei 與 PyTorch Hsinchu 的課表。

1. LeNet
2. LSTM

3. AlexNet
4. ZFNet
5. NIN
6. GoogLeNet
7. VGGNet
8. SqueezeNet

9. PreVGGNet
10. SVM
11. SMO
12. DPM
13. SS
14. FCN

15. R-CNN
16. SPPNet
17. Fast R-CNN
18. Faster R-CNN
19. YOLO
20. SSD

-----

三十篇基礎版,PyTorch New Taipei 的課表。

1. LeNet(AlexNet、ZFNet)
2. NIN + SENet(GoogLeNet、VGGNet、PreVGGNet、Highway)
3. ResNet
4. FCN(Mask R-CNN、YOLACT)
5. YOLOv1(Faster R-CNN、YOLOv3)

6. LSTM(Weight Decay、Dropout)
7. Seq2seq(Batch Normalization、Layer Normalization)
8. Attention(RAdam、Lookahead)
9. ConvS2S(ULMFiT、ELMo)
10. Transformer(GPT-1、BERT)

-----

十篇精華版,收費論文研討的課表。

1. LeNet
2. NIN
3. ResNet
4. FCN
5. YOLOv1

6. LSTM
7. Seq2seq
8. Attention
9. ConvS2S
10. Transformer

-----

四十八篇完整版,全方位 AI 課程的課表。

1. LeNet。AlexNet、ZFNet。
2. NIN、SENet。GoogLeNet、VGGNet、PreVGGNet、Highway v1 v2、Inception v3 v4。
3. ResNet v1、ResNet-D、ResNet v2、ResNet-V。ResNeXt、DenseNet。
4. FCN。Faster R-CNN、Mask R-CNN。
5. YOLOv1。SSD、FPN、YOLOv3、YOLOv4。

6. LSTM。NNLM、C&W、Word2vec。Weight Decay、Dropout。
7. Seq2seq。Paragraph2vec。Batch Normalization、Layer Normalization。
8. Attention。FSH。Adam、Ranger。
9. ConvS2S。Context2vec、ELMo。
10. Transformer。GPT-1、GPT-2、GPT-3、BERT。

-----

符號說明:

# basic
// advanced

-----

Paper

# Deep Learning
LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." nature 521.7553 (2015): 436.
https://creativecoding.soe.ucsc.edu/courses/cs523/slides/week3/DeepLearning_LeCun.pdf

// History of Deep Learning
Alom, Md Zahangir, et al. "The history began from alexnet: A comprehensive survey on deep learning approaches." arXiv preprint arXiv:1803.01164 (2018).
https://arxiv.org/ftp/arxiv/papers/1803/1803.01164.pdf

// Recent Advances in CNN
Gu, Jiuxiang, et al. "Recent advances in convolutional neural networks." Pattern Recognition 77 (2018): 354-377.
https://arxiv.org/pdf/1512.07108.pdf

// GPU
Raina, Rajat, Anand Madhavan, and Andrew Y. Ng. "Large-scale deep unsupervised learning using graphics processors." Proceedings of the 26th annual international conference on machine learning. ACM, 2009.
http://robotics.stanford.edu/~ang/papers/icml09-LargeScaleUnsupervisedDeepLearningGPU.pdf

// Difficult 1994
Bengio, Yoshua, Patrice Simard, and Paolo Frasconi. "Learning long-term dependencies with gradient descent is difficult." IEEE transactions on neural networks 5.2 (1994): 157-166.
https://pdfs.semanticscholar.org/d0be/39ee052d246ae99c082a565aba25b811be2d.pdf

// Difficult 2010
Glorot, Xavier, and Yoshua Bengio. "Understanding the difficulty of training deep feedforward neural networks." Proceedings of the thirteenth international conference on artificial intelligence and statistics. 2010.
http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf 

// Difficult 2013
Pascanu, Razvan, Tomas Mikolov, and Yoshua Bengio. "On the difficulty of training recurrent neural networks." International conference on machine learning. 2013.
http://proceedings.mlr.press/v28/pascanu13.pdf
 
-----

Part I:Computer Vision

-----

◎ Image Classicification

-----

# LeNet
LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.
http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf

# AlexNet
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.
http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

# ZFNet
Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." European conference on computer vision. Springer, Cham, 2014.
https://arxiv.org/pdf/1311.2901.pdf

// PreAlexNet

// PreZFNet

// Deconv

-----

# NIN
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." arXiv preprint arXiv:1312.4400 (2013).
https://arxiv.org/pdf/1312.4400.pdf

# SENet
Hu, Jie, Li Shen, and Gang Sun. "Squeeze-and-excitation networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
http://openaccess.thecvf.com/content_cvpr_2018/papers/Hu_Squeeze-and-Excitation_Networks_CVPR_2018_paper.pdf

// SKNet

// STNet

// RANet

// BAM

// CBAM

// RASNet

-----

# GoogLeNet
Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
http://openaccess.thecvf.com/content_cvpr_2015/papers/Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf

# VGGNet
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
https://arxiv.org/pdf/1409.1556/

# PreVGGNet
Ciresan, Dan C., et al. "Flexible, high performance convolutional neural networks for image classification." IJCAI Proceedings-International Joint Conference on Artificial Intelligence. Vol. 22. No. 1. 2011.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.481.4406&rep=rep1&type=pdf
  
# Highway v1
Srivastava, Rupesh Kumar, Klaus Greff, and Jürgen Schmidhuber. "Highway networks." arXiv preprint arXiv:1505.00387 (2015).
https://arxiv.org/pdf/1505.00387.pdf

# Highway v2
Srivastava, Rupesh K., Klaus Greff, and Jürgen Schmidhuber. "Training very deep networks." Advances in neural information processing systems. 2015.
https://papers.nips.cc/paper/5850-training-very-deep-networks.pdf

# Inception v3
Szegedy, Christian, et al. "Rethinking the inception architecture for computer vision." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Szegedy_Rethinking_the_Inception_CVPR_2016_paper.pdf

# Inception v4
Szegedy, Christian, et al. "Inception-v4, inception-resnet and the impact of residual connections on learning." AAAI. Vol. 4. 2017.
http://www.aaai.org/ocs/index.php/AAAI/AAAI17/paper/download/14806/14311

// CapsNet v0

// CapsNet v1
Sabour, Sara, Nicholas Frosst, and Geoffrey E. Hinton. "Dynamic routing between capsules." Advances in neural information processing systems. 2017.
http://papers.nips.cc/paper/6975-dynamic-routing-between-capsules.pdf

# CapsNet v2

# CapsNet v3

-----
 
# ResNet v1
He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
http://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf

# ResNet-D
Huang, Gao, et al. "Deep networks with stochastic depth." European conference on computer vision. Springer, Cham, 2016.
https://arxiv.org/pdf/1603.09382.pdf

# ResNet v2
He, Kaiming, et al. "Identity mappings in deep residual networks." European Conference on Computer Vision. Springer, Cham, 2016.
https://arxiv.org/pdf/1603.05027.pdf  

# ResNet-E
Veit, Andreas, Michael J. Wilber, and Serge Belongie. "Residual networks behave like ensembles of relatively shallow networks." Advances in neural information processing systems. 2016.
https://papers.nips.cc/paper/6556-residual-networks-behave-like-ensembles-of-relatively-shallow-networks.pdf 

# ResNet-V
Li, Hao, et al. "Visualizing the loss landscape of neural nets." Advances in Neural Information Processing Systems. 2018.
https://papers.nips.cc/paper/7875-visualizing-the-loss-landscape-of-neural-nets.pdf

-----

// ResNet-F
Zhang, Hongyi, Yann N. Dauphin, and Tengyu Ma. "Fixup initialization: Residual learning without normalization." arXiv preprint arXiv:1901.09321 (2019).
https://arxiv.org/pdf/1901.09321.pdf

// ResNet-I
Shwartz-Ziv, Ravid, and Naftali Tishby. "Opening the black box of deep neural networks via information." arXiv preprint arXiv:1703.00810 (2017).
https://arxiv.org/pdf/1703.00810.pdf

// ResNet-Q
Balduzzi, David, et al. "The shattered gradients problem: If resnets are the answer, then what is the question?." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017.
https://arxiv.org/pdf/1702.08591.pdf

// ResNet-S
Orhan, A. Emin, and Xaq Pitkow. "Skip connections eliminate singularities." arXiv preprint arXiv:1701.09175 (2017).
https://arxiv.org/pdf/1701.09175.pdf 

// ResNet-U
Liu, Tianyi, et al. "Towards Understanding the Importance of Shortcut Connections in Residual Networks." Advances in Neural Information Processing Systems. 2019.
http://papers.nips.cc/paper/9003-towards-understanding-the-importance-of-shortcut-connections-in-residual-networks.pdf 

// ResNet-W
He, Fengxiang, Tongliang Liu, and Dacheng Tao. "Why resnet works? residuals generalize." arXiv preprint arXiv:1904.01367 (2019).
https://arxiv.org/pdf/1904.01367.pdf 

// WRN
Zagoruyko, Sergey, and Nikos Komodakis. "Wide residual networks." arXiv preprint arXiv:1605.07146 (2016).
https://arxiv.org/pdf/1605.07146.pdf

# ResNeXt
Xie, Saining, et al. "Aggregated residual transformations for deep neural networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
http://openaccess.thecvf.com/content_cvpr_2017/papers/Xie_Aggregated_Residual_Transformations_CVPR_2017_paper.pdf 

# DenseNet
Huang, Gao, et al. "Densely connected convolutional networks." Proceedings of the IEEE conference on computer vision and pattern recognition. Vol. 1. No. 2. 2017.
http://openaccess.thecvf.com/content_cvpr_2017/papers/Huang_Densely_Connected_Convolutional_CVPR_2017_paper.pdf

// DPN
Chen, Yunpeng, et al. "Dual path networks." Advances in Neural Information Processing Systems. 2017.
https://papers.nips.cc/paper/7033-dual-path-networks.pdf

// DLA
Yu, Fisher, et al. "Deep layer aggregation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
http://openaccess.thecvf.com/content_cvpr_2018/papers/Yu_Deep_Layer_Aggregation_CVPR_2018_paper.pdf

// Res2Net
Gao, Shang-Hua, et al. "Res2Net: A New Multi-scale Backbone Architecture." arXiv preprint arXiv:1904.01169 (2019).
https://arxiv.org/pdf/1904.01169.pdf 

// PolyNet
Zhang, Xingcheng, et al. "Polynet: A pursuit of structural diversity in very deep networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.
http://openaccess.thecvf.com/content_cvpr_2017/papers/Zhang_PolyNet_A_Pursuit_CVPR_2017_paper.pdf

// FractalNet(DropPath)
Larsson, Gustav, Michael Maire, and Gregory Shakhnarovich. "Fractalnet: Ultra-deep neural networks without residuals." arXiv preprint arXiv:1605.07648 (2016).
https://arxiv.org/pdf/1605.07648.pdf

-----

Mobile

-----

# SqueezeNet
Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size." arXiv preprint arXiv:1602.07360 (2016).
https://arxiv.org/pdf/1602.07360.pdf

# MobileNet v1
Howard, Andrew G., et al. "Mobilenets: Efficient convolutional neural networks for mobile vision applications." arXiv preprint arXiv:1704.04861 (2017).
https://arxiv.org/pdf/1704.04861.pdf

# MobileNet v2
Sandler, Mark, et al. "Mobilenetv2: Inverted residuals and linear bottlenecks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
http://openaccess.thecvf.com/content_cvpr_2018/papers/Sandler_MobileNetV2_Inverted_Residuals_CVPR_2018_paper.pdf

# MobileNet v3
Howard, Andrew, et al. "Searching for mobilenetv3." arXiv preprint arXiv:1905.02244 (2019).
https://arxiv.org/pdf/1905.02244.pdf

# ShuffleNet v1
Zhang, Xiangyu, et al. "Shufflenet: An extremely efficient convolutional neural network for mobile devices." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
http://openaccess.thecvf.com/content_cvpr_2018/papers/Zhang_ShuffleNet_An_Extremely_CVPR_2018_paper.pdf

# ShuffleNet v2
Ma, Ningning, et al. "Shufflenet v2: Practical guidelines for efficient cnn architecture design." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
http://openaccess.thecvf.com/content_ECCV_2018/papers/Ningning_Light-weight_CNN_Architecture_ECCV_2018_paper.pdf

# Xception
Chollet, François. "Xception: Deep learning with depthwise separable convolutions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
http://openaccess.thecvf.com/content_cvpr_2017/papers/Chollet_Xception_Deep_Learning_CVPR_2017_paper.pdf

# ESPNet v1
Mehta, Sachin, et al. "Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
http://openaccess.thecvf.com/content_ECCV_2018/papers/Sachin_Mehta_ESPNet_Efficient_Spatial_ECCV_2018_paper.pdf

# ESPNet v2
Mehta, Sachin, et al. "Espnetv2: A light-weight, power efficient, and general purpose convolutional neural network." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
http://openaccess.thecvf.com/content_CVPR_2019/papers/Mehta_ESPNetv2_A_Light-Weight_Power_Efficient_and_General_Purpose_Convolutional_Neural_CVPR_2019_paper.pdf

-----

# NAS-RL
Zoph, Barret, and Quoc V. Le. "Neural architecture search with reinforcement learning." arXiv preprint arXiv:1611.01578 (2016).
https://arxiv.org/pdf/1611.01578.pdf

# NASNet(Scheduled DropPath)
Zoph, Barret, et al. "Learning transferable architectures for scalable image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
http://openaccess.thecvf.com/content_cvpr_2018/papers/Zoph_Learning_Transferable_Architectures_CVPR_2018_paper.pdf

// pNASNet
Liu, Chenxi, et al. "Progressive neural architecture search." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
http://openaccess.thecvf.com/content_ECCV_2018/papers/Chenxi_Liu_Progressive_Neural_Architecture_ECCV_2018_paper.pdf 

// AmoebaNet
Real, Esteban, et al. "Regularized evolution for image classifier architecture search." Proceedings of the aaai conference on artificial intelligence. Vol. 33. 2019.
https://arxiv.org/pdf/1802.01548.pdf

// mNASNet
Tan, Mingxing, et al. "Mnasnet: Platform-aware neural architecture search for mobile." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
http://openaccess.thecvf.com/content_CVPR_2019/papers/Tan_MnasNet_Platform-Aware_Neural_Architecture_Search_for_Mobile_CVPR_2019_paper.pdf

# Auto-DeepLab
Liu, Chenxi, et al. "Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
http://openaccess.thecvf.com/content_CVPR_2019/papers/Liu_Auto-DeepLab_Hierarchical_Neural_Architecture_Search_for_Semantic_Image_Segmentation_CVPR_2019_paper.pdf

# NAS-FPN
Ghiasi, Golnaz, Tsung-Yi Lin, and Quoc V. Le. "Nas-fpn: Learning scalable feature pyramid architecture for object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
http://openaccess.thecvf.com/content_CVPR_2019/papers/Ghiasi_NAS-FPN_Learning_Scalable_Feature_Pyramid_Architecture_for_Object_Detection_CVPR_2019_paper.pdf

# AutoAugment
Cubuk, Ekin D., et al. "Autoaugment: Learning augmentation policies from data." arXiv preprint arXiv:1805.09501 (2018).
https://arxiv.org/pdf/1805.09501.pdf

# EfficientNet
Tan, Mingxing, and Quoc V. Le. "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks." arXiv preprint arXiv:1905.11946 (2019).
https://arxiv.org/pdf/1905.11946.pdf

# EffcientDet
Tan, Mingxing, Ruoming Pang, and Quoc V. Le. "Efficientdet: Scalable and efficient object detection." arXiv preprint arXiv:1911.09070 (2019).
https://arxiv.org/pdf/1911.09070.pdf

-----

Semantic Segmentation

-----

// SDS
Hariharan, Bharath, et al. "Simultaneous detection and segmentation." European Conference on Computer Vision. Springer, Cham, 2014.
https://arxiv.org/pdf/1407.1808.pdf

# FCN
Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Long_Fully_Convolutional_Networks_2015_CVPR_paper.pdf

# DeconvNet
Noh, Hyeonwoo, Seunghoon Hong, and Bohyung Han. "Learning deconvolution network for semantic segmentation." Proceedings of the IEEE international conference on computer vision. 2015.
https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Noh_Learning_Deconvolution_Network_ICCV_2015_paper.pdf

# SegNet
Badrinarayanan, Vijay, Alex Kendall, and Roberto Cipolla. "Segnet: A deep convolutional encoder-decoder architecture for image segmentation." IEEE transactions on pattern analysis and machine intelligence 39.12 (2017): 2481-2495.
https://arxiv.org/pdf/1511.00561.pdf

# U-Net
Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015.
https://arxiv.org/pdf/1505.04597.pdf

# U-Net++
Zhou, Zongwei, et al. "Unet++: A nested u-net architecture for medical image segmentation." Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Springer, Cham, 2018. 3-11.
https://arxiv.org/pdf/1807.10165.pdf

-----

# DilatedNet
Yu, Fisher, and Vladlen Koltun. "Multi-scale context aggregation by dilated convolutions." arXiv preprint arXiv:1511.07122 (2015).
https://arxiv.org/pdf/1511.07122.pdf 

# ENet
Paszke, Adam, et al. "Enet: A deep neural network architecture for real-time semantic segmentation." arXiv preprint arXiv:1606.02147 (2016).
https://arxiv.org/pdf/1606.02147.pdf
 
# DRN
Yu, Fisher, Vladlen Koltun, and Thomas Funkhouser. "Dilated residual networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
http://openaccess.thecvf.com/content_cvpr_2017/papers/Yu_Dilated_Residual_Networks_CVPR_2017_paper.pdf

# FastFCN
Wu, Huikai, et al. "FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation." arXiv preprint arXiv:1903.11816 (2019).
https://arxiv.org/pdf/1903.11816.pdf 

-----

# FC-CRF
Krähenbühl, Philipp, and Vladlen Koltun. "Efficient inference in fully connected crfs with gaussian edge potentials." Advances in neural information processing systems. 2011.
http://papers.nips.cc/paper/4296-efficient-inference-in-fully-connected-crfs-with-gaussian-edge-potentials.pdf

# DeepLab v1
Chen, Liang-Chieh, et al. "Semantic image segmentation with deep convolutional nets and fully connected crfs." arXiv preprint arXiv:1412.7062 (2014).
https://arxiv.org/pdf/1412.7062.pdf

# DeepLab v2
Chen, Liang-Chieh, et al. "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs." arXiv preprint arXiv:1606.00915 (2016).
https://arxiv.org/pdf/1606.00915.pdf 

# DeepLab v3
Chen, Liang-Chieh, et al. "Rethinking atrous convolution for semantic image segmentation." arXiv preprint arXiv:1706.05587 (2017).
https://arxiv.org/pdf/1706.05587.pdf  

# DeepLab v3+
Chen, Liang-Chieh, et al. "Encoder-decoder with atrous separable convolution for semantic image segmentation." Proceedings of the European conference on computer vision (ECCV). 2018.
http://openaccess.thecvf.com/content_ECCV_2018/papers/Liang-Chieh_Chen_Encoder-Decoder_with_Atrous_ECCV_2018_paper.pdf 

# Gated-SCNN
Takikawa, Towaki, et al. "Gated-scnn: Gated shape cnns for semantic segmentation." Proceedings of the IEEE International Conference on Computer Vision. 2019.
http://openaccess.thecvf.com/content_ICCV_2019/papers/Takikawa_Gated-SCNN_Gated_Shape_CNNs_for_Semantic_Segmentation_ICCV_2019_paper.pdf

-----

# ResNet-38
Wu, Zifeng, Chunhua Shen, and Anton Van Den Hengel. "Wider or deeper: Revisiting the resnet model for visual recognition." Pattern Recognition 90 (2019): 119-133.
https://arxiv.org/pdf/1611.10080.pdf 

# Tiramisu
Jégou, Simon, et al. "The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2017.
http://openaccess.thecvf.com/content_cvpr_2017_workshops/w13/papers/Jegou_The_One_Hundred_CVPR_2017_paper.pdf

# RefineNet
Lin, Guosheng, et al. "Refinenet: Multi-path refinement networks for high-resolution semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
http://openaccess.thecvf.com/content_cvpr_2017/papers/Lin_RefineNet_Multi-Path_Refinement_CVPR_2017_paper.pdf 

# RefineNet-LW
Nekrasov, Vladimir, Chunhua Shen, and Ian Reid. "Light-weight refinenet for real-time semantic segmentation." arXiv preprint arXiv:1810.03272 (2018).
https://arxiv.org/pdf/1810.03272.pdf

# RefineNet-AA
Nekrasov, Vladimir, et al. "Real-time joint semantic segmentation and depth estimation using asymmetric annotations." 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019.
https://arxiv.org/pdf/1809.04766.pdf

# VPLR
Zhu, Yi, et al. "Improving Semantic Segmentation via Video Propagation and Label Relaxation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhu_Improving_Semantic_Segmentation_via_Video_Propagation_and_Label_Relaxation_CVPR_2019_paper.pdf

# PSPNet
Zhao, Hengshuang, et al. "Pyramid scene parsing network." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
http://openaccess.thecvf.com/content_cvpr_2017/papers/Zhao_Pyramid_Scene_Parsing_CVPR_2017_paper.pdf

# ICNet
Zhao, Hengshuang, et al. "Icnet for real-time semantic segmentation on high-resolution images." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
http://openaccess.thecvf.com/content_ECCV_2018/papers/Hengshuang_Zhao_ICNet_for_Real-Time_ECCV_2018_paper.pdf

# BiSeNet
Yu, Changqian, et al. "Bisenet: Bilateral segmentation network for real-time semantic segmentation." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
http://openaccess.thecvf.com/content_ECCV_2018/papers/Changqian_Yu_BiSeNet_Bilateral_Segmentation_ECCV_2018_paper.pdf

# Fast-SCNN
Poudel, Rudra PK, Stephan Liwicki, and Roberto Cipolla. "Fast-SCNN: fast semantic segmentation network." arXiv preprint arXiv:1902.04502 (2019).
https://arxiv.org/pdf/1902.04502.pdf

# BlitzNet
Dvornik, Nikita, et al. "Blitznet: A real-time deep network for scene understanding." Proceedings of the IEEE international conference on computer vision. 2017.
http://openaccess.thecvf.com/content_ICCV_2017/papers/Dvornik_BlitzNet_A_Real-Time_ICCV_2017_paper.pdf

// SA-GAN

// DANet

// OCNet
  
-----


Instance Segmentation 

-----

// MNC
Dai, Jifeng, Kaiming He, and Jian Sun. "Instance-aware semantic segmentation via multi-task network cascades." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Dai_Instance-Aware_Semantic_Segmentation_CVPR_2016_paper.pdf

// DeepMask
Pinheiro, Pedro O., Ronan Collobert, and Piotr Dollár. "Learning to segment object candidates." Advances in Neural Information Processing Systems. 2015.
https://papers.nips.cc/paper/5852-learning-to-segment-object-candidates.pdf

// SharpMask
Pinheiro, Pedro O., et al. "Learning to refine object segments." European Conference on Computer Vision. Springer, Cham, 2016.
https://arxiv.org/pdf/1603.08695.pdf

// MultiPathNet
Zagoruyko, Sergey, et al. "A multipath network for object detection." arXiv preprint arXiv:1604.02135 (2016).
https://arxiv.org/pdf/1604.02135.pdf

// InstanceFCN
Dai, Jifeng, et al. "Instance-sensitive fully convolutional networks." European Conference on Computer Vision. Springer, Cham, 2016.
https://arxiv.org/pdf/1603.08678.pdf

// FCIS
Li, Yi, et al. "Fully convolutional instance-aware semantic segmentation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.
http://openaccess.thecvf.com/content_cvpr_2017/papers/Li_Fully_Convolutional_Instance-Aware_CVPR_2017_paper.pdf

# Mask R-CNN
He, Kaiming, et al. "Mask r-cnn." Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 2017.
http://openaccess.thecvf.com/content_ICCV_2017/papers/He_Mask_R-CNN_ICCV_2017_paper.pdf

# YOLACT++
Bolya, Daniel, et al. "YOLACT++: Better Real-time Instance Segmentation." arXiv preprint arXiv:1912.06218 (2019).
https://arxiv.org/pdf/1912.06218.pdf

-----

◎ Object Detection

-----

// SVM

// SMO
Platt, John. "Sequential minimal optimization: A fast algorithm for training support vector machines." (1998).
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-98-14.pdf

-----

// SIFT

// HOG
 
// DPM

-----

# DPM
Felzenszwalb, Pedro F., et al. "Object detection with discriminatively trained part-based models." IEEE transactions on pattern analysis and machine intelligence 32.9 (2010): 1627-1645.
https://ttic.uchicago.edu/~dmcallester/lsvm-pami.pdf

# SS
Uijlings, Jasper RR, et al. "Selective search for object recognition." International journal of computer vision 104.2 (2013): 154-171.
https://ivi.fnwi.uva.nl/isis/publications/2013/UijlingsIJCV2013/UijlingsIJCV2013.pdf

# R-CNN
Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2014.
https://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Girshick_Rich_Feature_Hierarchies_2014_CVPR_paper.pdf?spm=5176.100239.blogcont55892.8.pm8zm1&file=Girshick_Rich_Feature_Hierarchies_2014_CVPR_paper.pdf

# SPPNet
He, Kaiming, et al. "Spatial pyramid pooling in deep convolutional networks for visual recognition." european conference on computer vision. Springer, Cham, 2014.
https://arxiv.org/pdf/1406.4729.pdf
 
# Fast R-CNN
Girshick, Ross. "Fast R-CNN." Proceedings of the IEEE international conference on computer vision. 2015.
http://openaccess.thecvf.com/content_iccv_2015/papers/Girshick_Fast_R-CNN_ICCV_2015_paper.pdf

# Faster R-CNN
Ren, Shaoqing, et al. "Faster R-CNN: Towards real-time object detection with region proposal networks." Advances in neural information processing systems. 2015.
http://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks.pdf

-----

# OverFeat
Sermanet, Pierre, et al. "Overfeat: Integrated recognition, localization and detection using convolutional networks." arXiv preprint arXiv:1312.6229 (2013).
https://arxiv.org/pdf/1312.6229.pdf
 
# YOLO v1
Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Redmon_You_Only_Look_CVPR_2016_paper.pdf

# SSD
Liu, Wei, et al. "SSD: Single shot multibox detector." European conference on computer vision. Springer, Cham, 2016.
https://arxiv.org/pdf/1512.02325.pdf

# DSSD
Fu, Cheng-Yang, et al. "Dssd: Deconvolutional single shot detector." arXiv preprint arXiv:1701.06659 (2017).
https://arxiv.org/pdf/1701.06659.pdf

# YOLO v2
Redmon, Joseph, and Ali Farhadi. "YOLO9000: better, faster, stronger." arXiv preprint (2017).

-----

# ION
Bell, Sean, et al. "Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
http://openaccess.thecvf.com/content_cvpr_2016/papers/Bell_Inside-Outside_Net_Detecting_CVPR_2016_paper.pdf

# R-FCN
Dai, Jifeng, et al. "R-fcn: Object detection via region-based fully convolutional networks." Advances in neural information processing systems. 2016.
https://papers.nips.cc/paper/6465-r-fcn-object-detection-via-region-based-fully-convolutional-networks.pdf

# SATO
Huang, Jonathan, et al. "Speed/accuracy trade-offs for modern convolutional object detectors." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
http://openaccess.thecvf.com/content_cvpr_2017/papers/Huang_SpeedAccuracy_Trade-Offs_for_CVPR_2017_paper.pdf

# DCN v1
Dai, Jifeng, et al. "Deformable convolutional networks." Proceedings of the IEEE international conference on computer vision. 2017.
http://openaccess.thecvf.com/content_ICCV_2017/papers/Dai_Deformable_Convolutional_Networks_ICCV_2017_paper.pdf

# DCN v2
Zhu, Xizhou, et al. "Deformable convnets v2: More deformable, better results." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhu_Deformable_ConvNets_V2_More_Deformable_Better_Results_CVPR_2019_paper.pdf

# Cascade R-CNN
Cai, Zhaowei, and Nuno Vasconcelos. "Cascade r-cnn: Delving into high quality object detection." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
http://openaccess.thecvf.com/content_cvpr_2018/papers/Cai_Cascade_R-CNN_Delving_CVPR_2018_paper.pdf   

# FPN
Lin, Tsung-Yi, et al. "Feature pyramid networks for object detection." CVPR. Vol. 1. No. 2. 2017.
http://openaccess.thecvf.com/content_cvpr_2017/papers/Lin_Feature_Pyramid_Networks_CVPR_2017_paper.pdf 

# STDN
Zhou, Peng, et al. "Scale-transferrable object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
http://openaccess.thecvf.com/content_cvpr_2018/papers/Zhou_Scale-Transferrable_Object_Detection_CVPR_2018_paper.pdf

# YOLO v3
YOLOv3: An Incremental Improvement
https://pjreddie.com/media/files/papers/YOLOv3.pdf 

# RON
Kong, Tao, et al. "Ron: Reverse connection with objectness prior networks for object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.
http://openaccess.thecvf.com/content_cvpr_2017/papers/Kong_RON_Reverse_Connection_CVPR_2017_paper.pdf 

# RefineDet
Zhang, Shifeng, et al. "Single-shot refinement neural network for object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
http://openaccess.thecvf.com/content_cvpr_2018/papers/Zhang_Single-Shot_Refinement_Neural_CVPR_2018_paper.pdf
 
# M2Det
Zhao, Qijie, et al. "M2det: A single-shot object detector based on multi-level feature pyramid network." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33. 2019.
https://arxiv.org/pdf/1811.04533.pdf

# SNIP
Singh, Bharat, and Larry S. Davis. "An analysis of scale invariance in object detection snip." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
http://openaccess.thecvf.com/content_cvpr_2018/papers/Singh_An_Analysis_of_CVPR_2018_paper.pdf

# SNIPER
Singh, Bharat, Mahyar Najibi, and Larry S. Davis. "SNIPER: Efficient multi-scale training." Advances in Neural Information Processing Systems. 2018.
http://papers.nips.cc/paper/8143-sniper-efficient-multi-scale-training.pdf

# AutoFocus
Najibi, Mahyar, Bharat Singh, and Larry S. Davis. "Autofocus: Efficient multi-scale inference." Proceedings of the IEEE International Conference on Computer Vision. 2019.
http://openaccess.thecvf.com/content_ICCV_2019/papers/Najibi_AutoFocus_Efficient_Multi-Scale_Inference_ICCV_2019_paper.pdf

# DetNet
Li, Zeming, et al. "Detnet: Design backbone for object detection." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
http://openaccess.thecvf.com/content_ECCV_2018/papers/Zeming_Li_DetNet_Design_Backbone_ECCV_2018_paper.pdf

# TridentNet
Li, Yanghao, et al. "Scale-aware trident networks for object detection." arXiv preprint arXiv:1901.01892 (2019).
https://arxiv.org/pdf/1901.01892.pdf

-----

# OHEM
Shrivastava, Abhinav, Abhinav Gupta, and Ross Girshick. "Training region-based object detectors with online hard example mining." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Shrivastava_Training_Region-Based_Object_CVPR_2016_paper.pdf

# RetinaNet(Focal Loss)
Lin, Tsung-Yi, et al. "Focal loss for dense object detection." IEEE transactions on pattern analysis and machine intelligence (2018).
https://vision.cornell.edu/se3/wp-content/uploads/2017/09/focal_loss.pdf
https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8417976

# GHM
Li, Buyu, Yu Liu, and Xiaogang Wang. "Gradient harmonized single-stage detector." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33. 2019.
https://arxiv.org/pdf/1811.05181.pdf

# Libra R-CNN
Pang, Jiangmiao, et al. "Libra r-cnn: Towards balanced learning for object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
http://openaccess.thecvf.com/content_CVPR_2019/papers/Pang_Libra_R-CNN_Towards_Balanced_Learning_for_Object_Detection_CVPR_2019_paper.pdf

# DCR v1
Cheng, Bowen, et al. "Revisiting rcnn: On awakening the classification power of faster rcnn." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
http://openaccess.thecvf.com/content_ECCV_2018/papers/Bowen_Cheng_Revisiting_RCNN_On_ECCV_2018_paper.pdf
 
# DCR v2
Cheng, Bowen, et al. "Decoupled classification refinement: Hard false positive suppression for object detection." arXiv preprint arXiv:1810.04002 (2018).
https://arxiv.org/pdf/1810.04002.pdf

# PISA
Cao, Yuhang, et al. "Prime Sample Attention in Object Detection." arXiv preprint arXiv:1904.04821 (2019).
https://arxiv.org/pdf/1904.04821.pdf

-----

// CornerNet
Law, Hei, and Jia Deng. "Cornernet: Detecting objects as paired keypoints." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
http://openaccess.thecvf.com/content_ECCV_2018/papers/Hei_Law_CornerNet_Detecting_Objects_ECCV_2018_paper.pdf
 
// CenterNet
Duan, Kaiwen, et al. "CenterNet: Object Detection with Keypoint Triplets." arXiv preprint arXiv:1904.08189 (2019).
https://arxiv.org/pdf/1904.08189.pdf
 
// SelectNet
Liu, Yunru, Tingran Gao, and Haizhao Yang. "SelectNet: Learning to Sample from the Wild for Imbalanced Data Training." arXiv preprint arXiv:1905.09872 (2019).
https://arxiv.org/pdf/1905.09872.pdf

// Bottom-up
Zhou, Xingyi, Jiacheng Zhuo, and Philipp Krahenbuhl. "Bottom-up object detection by grouping extreme and center points." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhou_Bottom-Up_Object_Detection_by_Grouping_Extreme_and_Center_Points_CVPR_2019_paper.pdf 

-----

Part II - Natural Language Processing

-----

LSTM

-----

// RNN(Recurrent Neural Network)
Elman, Jeffrey L. "Finding structure in time." Cognitive science 14.2 (1990): 179-211.
http://www2.fiit.stuba.sk/~kvasnicka/NeuralNetworks/6.prednaska/Elman_SRNN_paper.pdf

# LSTM(Long Short-Term Memory)
Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9.8 (1997): 1735-1780.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.676.4320&rep=rep1&type=pdf

// BRNN(Bidirectional RNN)
Schuster, Mike, and Kuldip K. Paliwal. "Bidirectional recurrent neural networks." IEEE Transactions on Signal Processing 45.11 (1997): 2673-2681.
http://www.cs.cmu.edu/afs/cs/user/bhiksha/WWW/courses/deeplearning/Fall.2016/pdfs/Schuster97_BRNN.pdf

// BLSTM(Bidirectional LSTM)
Graves, Alex, Navdeep Jaitly, and Abdel-rahman Mohamed. "Hybrid speech recognition with deep bidirectional LSTM." Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on. IEEE, 2013.
http://www.cs.toronto.edu/~graves/asru_2013.pdf

# GRU(Gated Recurrent Unit)
Cho, Kyunghyun, et al. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." arXiv preprint arXiv:1406.1078 (2014).
https://arxiv.org/pdf/1406.1078.pdf
 
// MGU(Minimal Gated Unit)
Zhou, Guo-Bing, et al. "Minimal gated unit for recurrent neural networks." International Journal of Automation and Computing 13.3 (2016): 226-234.
https://arxiv.org/pdf/1603.09420.pdf

// SRU(Simple Recurrent Unit)
Lei, Tao, et al. "Simple recurrent units for highly parallelizable recurrence." Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018.
https://arxiv.org/pdf/1709.02755.pdf

// Comparison of LSTM, GRU, MGU, and SRU
Hou, Bo-Jian, and Zhi-Hua Zhou. "Learning with Interpretable Structure from RNN." arXiv preprint arXiv:1810.10708 (2018).
https://arxiv.org/pdf/1810.10708.pdf
 
-----

// EM
Dempster, Arthur P., Nan M. Laird, and Donald B. Rubin. "Maximum likelihood from incomplete data via the EM algorithm." Journal of the Royal Statistical Society: Series B (Methodological) 39.1 (1977): 1-22.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.163.7580&rep=rep1&type=pdf

// STM
Levenberg, Abby, Chris Callison-Burch, and Miles Osborne. "Stream-based translation models for statistical machine translation." Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. 2010.
https://www.aclweb.org/anthology/N10-1062.pdf

// n-gram
Brown, Peter F., et al. "Class-based n-gram models of natural language." Computational linguistics 18.4 (1992): 467-480.
https://www.aclweb.org/anthology/J92-4003.pdf

# NNLM
Bengio, Yoshua, et al. "A neural probabilistic language model." Journal of machine learning research 3.Feb (2003): 1137-1155.
http://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf 

// C&W
Collobert, Ronan, and Jason Weston. "A unified architecture for natural language processing: Deep neural networks with multitask learning." Proceedings of the 25th international conference on Machine learning. ACM, 2008.
http://www.thespermwhale.com/jaseweston/papers/unified_nlp.pdf

# C&W
Collobert, Ronan, et al. "Natural language processing (almost) from scratch." Journal of machine learning research 12.ARTICLE (2011): 2493-2537.
http://www.jmlr.org/papers/volume12/collobert11a/collobert11a.pdf

// RNNLM v1
Mikolov, Tomáš, Martin Karafiát, and Lukáš Burget. "Jan ˇCernocky, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model." Eleventh annual conference of the international speech communication association. 2010.
https://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf

// RNNLM v2
Mikolov, Tomáš, et al. "Extensions of recurrent neural network language model." 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2011.
https://pdfs.semanticscholar.org/bba8/a2c9b9121e7c78e91ea2a68630e77c0ad20f.pdf

// RNNLM v3
Mikolov, Tomas, et al. "Rnnlm-recurrent neural network language modeling toolkit." Proc. of the 2011 ASRU Workshop. 2011.
http://www.fit.vutbr.cz/~imikolov/rnnlm/rnnlm-demo.pdf

-----

# Word2vec v1
Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 (2013).
https://arxiv.org/pdf/1301.3781.pdf

# Word2vec v2
Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems. 2013.
https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf

# Word2vec v3
Rong, Xin. "word2vec parameter learning explained." arXiv preprint arXiv:1411.2738 (2014).
https://arxiv.org/pdf/1411.2738.pdf

// GloVe
Pennington, Jeffrey, Richard Socher, and Christopher Manning. "Glove: Global vectors for word representation." Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014.
https://www.aclweb.org/anthology/D14-1162

// fastText v1
Joulin, Armand, et al. "Bag of tricks for efficient text classification." arXiv preprint arXiv:1607.01759 (2016).
https://arxiv.org/pdf/1607.01759.pdf

// fastText v2
Bojanowski, Piotr, et al. "Enriching word vectors with subword information." Transactions of the Association for Computational Linguistics 5 (2017): 135-146.
https://www.mitpressjournals.org/doi/pdfplus/10.1162/tacl_a_00051

// WordRank
Ji, Shihao, et al. "Wordrank: Learning word embeddings via robust ranking." arXiv preprint arXiv:1506.02761 (2015).
https://arxiv.org/pdf/1506.02761.pdf

-----

Seq2seq

-----

# Seq2seq 1 - using LSTM
Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." Advances in neural information processing systems. 2014.
http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf 

# Seq2seq 2

-----

# Paragraph2vec
Le, Quoc, and Tomas Mikolov. "Distributed representations of sentences and documents." International conference on machine learning. 2014.
http://proceedings.mlr.press/v32/le14.pdf

// Skip-Thought
Kiros, Ryan, et al. "Skip-thought vectors." Advances in neural information processing systems. 2015.
https://papers.nips.cc/paper/5950-skip-thought-vectors.pdf

// Quick-Thought
Logeswaran, Lajanugen, and Honglak Lee. "An efficient framework for learning sentence representations." arXiv preprint arXiv:1803.02893 (2018).
https://arxiv.org/pdf/1803.02893.pdf

// InferSent
Conneau, Alexis, et al. "Supervised learning of universal sentence representations from natural language inference data." arXiv preprint arXiv:1705.02364 (2017).
https://arxiv.org/pdf/1705.02364.pdf

// MILA SE

// Google SE

-----

Attention

-----

# Attention 1 - using GRU
Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. "Neural machine translation by jointly learning to align and translate." arXiv preprint arXiv:1409.0473 (2014).
https://arxiv.org/pdf/1409.0473.pdf

# Visual Attention
Xu, Kelvin, et al. "Show, attend and tell: Neural image caption generation with visual attention." International conference on machine learning. 2015.
http://proceedings.mlr.press/v37/xuc15.pdf 

# Grad-CAM

# Attention 2 - using LSTM
Luong, Minh-Thang, Hieu Pham, and Christopher D. Manning. "Effective approaches to attention-based neural machine translation." arXiv preprint arXiv:1508.04025 (2015).
https://arxiv.org/pdf/1508.04025.pdf

-----

// NTM
Graves, Alex, Greg Wayne, and Ivo Danihelka. "Neural turing machines." arXiv preprint arXiv:1410.5401 (2014).
https://arxiv.org/pdf/1410.5401.pdf

// DNC
// Hybrid Computing
Graves, Alex, et al. "Hybrid computing using a neural network with dynamic external memory." Nature 538.7626 (2016): 471.
https://campus.swarma.org/public/ueditor/php/upload/file/20170609/1497019302822809.pdf

// MANN

// RL NTM
Zaremba, Wojciech, and Ilya Sutskever. "Reinforcement learning neural turing machines-revised." arXiv preprint arXiv:1505.00521 (2015).
https://arxiv.org/pdf/1505.00521.pdf

// Implementing NTM
Collier, Mark, and Joeran Beel. "Implementing Neural Turing Machines." International Conference on Artificial Neural Networks. Springer, Cham, 2018.
https://arxiv.org/pdf/1807.08518.pdf

-----

// MN
J Weston, S Chopra, and A Bordes. Memory networks. ICLR, 2014.
https://arxiv.org/abs/1410.3916

// EEMN
Sukhbaatar, Sainbayar, Jason Weston, and Rob Fergus. "End-to-end memory networks." Advances in neural information processing systems. 2015.
https://papers.nips.cc/paper/5846-end-to-end-memory-networks.pdf

// KVMN
Miller, Alexander, et al. "Key-value memory networks for directly reading documents." arXiv preprint arXiv:1606.03126 (2016).
https://arxiv.org/pdf/1606.03126.pdf

// PN
Vinyals, Oriol, Meire Fortunato, and Navdeep Jaitly. "Pointer networks." Advances in Neural Information Processing Systems. 2015.
http://papers.nips.cc/paper/5866-pointer-networks.pdf

// Set2set
Vinyals, Oriol, Samy Bengio, and Manjunath Kudlur. "Order matters: Sequence to sequence for sets." arXiv preprint arXiv:1511.06391 (2015).
https://arxiv.org/pdf/1511.06391.pdf

# FSA
Daniluk, Michał, et al. "Frustratingly short attention spans in neural language modeling." arXiv preprint arXiv:1702.04521 (2017).
https://arxiv.org/pdf/1702.04521.pdf

// MHA
Iida, Shohei, et al. "A Multi-Hop Attention for RNN based Neural Machine Translation." Proceedings of The 8th Workshop on Patent and Scientific Literature Translation. 2019.
https://www.aclweb.org/anthology/W19-7203

// AOH
Iida, Shohei, et al. "Attention over Heads: A Multi-Hop Attention for Neural Machine Translation." Proceedings of the 57th Conference of the Association for Computational Linguistics: Student Research Workshop. 2019.
https://www.aclweb.org/anthology/P19-2030

-----

ConvS2S

-----

// GLU
Dauphin, Yann N., et al. "Language modeling with gated convolutional networks." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017.
https://arxiv.org/pdf/1612.08083.pdf

# ConvS2S
Gehring, Jonas, et al. "Convolutional sequence to sequence learning." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017.
https://arxiv.org/pdf/1705.03122.pdf

-----

# Contex2vec

# ELMo
Peters, Matthew E., et al. "Deep contextualized word representations." arXiv preprint arXiv:1802.05365 (2018).
https://arxiv.org/pdf/1802.05365.pdf 

// ULMFiT
Howard, Jeremy, and Sebastian Ruder. "Universal language model fine-tuning for text classification." arXiv preprint arXiv:1801.06146 (2018).
https://arxiv.org/pdf/1801.06146.pdf

// MultiFiT

-----

Transformer

-----

# Transformer
Vaswani, Ashish, et al. "Attention is all you need." Advances in Neural Information Processing Systems. 2017.
https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

# GPT-1
Radford, Alec, et al. "Improving language understanding by generative pre-training." URL https://s3-us-west-2. amazonaws. com/openai-assets/research-covers/languageunsupervised/language understanding paper. pdf (2018).
https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf 

# GPT-2
Vig, Jesse. "Visualizing Attention in Transformer-Based Language models." arXiv preprint arXiv:1904.02679 (2019).
https://arxiv.org/pdf/1904.02679.pdf

# GPT-3

# BERT
Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).
https://arxiv.org/pdf/1810.04805.pdf

// MTL
Baxter, Jonathan. "A model of inductive bias learning." Journal of artificial intelligence research 12 (2000): 149-198.
https://arxiv.org/pdf/1106.0245.pdf

// MTL Overview
Ruder, Sebastian. "An overview of multi-task learning in deep neural networks." arXiv preprint arXiv:1706.05098 (2017).
https://arxiv.org/pdf/1706.05098.pdf

-----
 
// Universal Transformers
Dehghani, Mostafa, et al. "Universal transformers." arXiv preprint arXiv:1807.03819 (2018).
https://arxiv.org/pdf/1807.03819.pdf

// Transformer XL
Dai, Zihang, et al. "Transformer-xl: Attentive language models beyond a fixed-length context." arXiv preprint arXiv:1901.02860 (2019).
https://arxiv.org/pdf/1901.02860.pdf
 
// MT-DNN
Liu, Xiaodong, et al. "Multi-Task Deep Neural Networks for Natural Language Understanding." arXiv preprint arXiv:1901.11504 (2019).
https://arxiv.org/pdf/1901.11504.pdf

// ERNIE Baidu
Sun, Yu, et al. "ERNIE: Enhanced Representation through Knowledge Integration." arXiv preprint arXiv:1904.09223 (2019).
https://arxiv.org/pdf/1904.09223.pdf

// ERNIE THU
Zhang, Zhengyan, et al. "ERNIE: Enhanced Language Representation with Informative Entities." arXiv preprint arXiv:1905.07129 (2019).
https://arxiv.org/pdf/1905.07129.pdf

// XLMs Facebook
Lample, Guillaume, and Alexis Conneau. "Cross-lingual Language Model Pretraining." arXiv preprint arXiv:1901.07291 (2019).
https://arxiv.org/pdf/1901.07291.pdf

// LASER Facebook
Artetxe, Mikel, and Holger Schwenk. "Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond." arXiv preprint arXiv:1812.10464 (2018).
https://arxiv.org/pdf/1812.10464.pdf

// MASS Microsoft
Song, Kaitao, et al. "Mass: Masked sequence to sequence pre-training for language generation." arXiv preprint arXiv:1905.02450 (2019).
https://arxiv.org/pdf/1905.02450.pdf

// UNILM Microsoft
Dong, Li, et al. "Unified Language Model Pre-training for Natural Language Understanding and Generation." arXiv preprint arXiv:1905.03197 (2019).
https://arxiv.org/pdf/1905.03197.pdf

// ON-LSTM
Shen, Yikang, et al. "Ordered neurons: Integrating tree structures into recurrent neural networks." arXiv preprint arXiv:1810.09536 (2018).
https://arxiv.org/pdf/1810.09536.pdf

// XLNet
Yang, Zhilin, et al. "XLNet: Generalized Autoregressive Pretraining for Language Understanding." arXiv preprint arXiv:1906.08237 (2019).
https://arxiv.org/pdf/1906.08237.pdf

-----

Part III:Fundamental Topics

-----

Regularization

-----

# Weight Decay
Zhang, Guodong, et al. "Three mechanisms of weight decay regularization." arXiv preprint arXiv:1810.12281 (2018).
https://arxiv.org/pdf/1810.12281.pdf

// WD 1989
Hanson, Stephen José, and Lorien Y. Pratt. "Comparing biases for minimal network construction with back-propagation." Advances in neural information processing systems. 1989.
http://papers.nips.cc/paper/156-comparing-biases-for-minimal-network-construction-with-back-propagation.pdf

// WD 1992
Krogh, Anders, and John A. Hertz. "A simple weight decay can improve generalization." Advances in neural information processing systems. 1992.
http://papers.nips.cc/paper/563-a-simple-weight-decay-can-improve-generalization.pdf

# L2
# Ridge Regression
Hoerl, Arthur E., and Robert W. Kennard. "Ridge regression: Biased estimation for nonorthogonal problems." Technometrics 12.1 (1970): 55-67.
https://amstat.tandfonline.com/doi/pdf/10.1080/00401706.1970.10488634

# L1
# Lasso Regression
Tibshirani, Robert. "Regression shrinkage and selection via the lasso." Journal of the Royal Statistical Society: Series B (Methodological) 58.1 (1996): 267-288.
http://www.stat.ucla.edu/~sczhu/courses/ucla/stat_232b/chapters/LASSO.pdf

# L0
Louizos, Christos, Max Welling, and Diederik P. Kingma. "Learning Sparse Neural Networks through $ L_0 $ Regularization." arXiv preprint arXiv:1712.01312 (2017).
https://arxiv.org/pdf/1712.01312.pdf

-----

# Dropout
Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." The Journal of Machine Learning Research 15.1 (2014): 1929-1958.
http://www.jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf

# DropConnect
Wan, Li, et al. "Regularization of neural networks using dropconnect." International conference on machine learning. 2013.
http://proceedings.mlr.press/v28/wan13.pdf 

# FractalNet(DropPath)
Larsson, Gustav, Michael Maire, and Gregory Shakhnarovich. "Fractalnet: Ultra-deep neural networks without residuals." arXiv preprint arXiv:1605.07648 (2016).
https://arxiv.org/pdf/1605.07648.pdf

# NASNet(Scheduled DropPath)
Zoph, Barret, et al. "Learning transferable architectures for scalable image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
http://openaccess.thecvf.com/content_cvpr_2018/papers/Zoph_Learning_Transferable_Architectures_CVPR_2018_paper.pdf

# Shake-Shake
Gastaldi, Xavier. "Shake-shake regularization." arXiv preprint arXiv:1705.07485 (2017).
https://arxiv.org/pdf/1705.07485.pdf

# ShakeDrop
Yamada, Yoshihiro, et al. "Shakedrop regularization for deep residual learning." arXiv preprint arXiv:1802.02375 (2018).
https://arxiv.org/pdf/1802.02375.pdf

# Spatial Dropout
Tompson, Jonathan, et al. "Efficient object localization using convolutional networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.
https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Tompson_Efficient_Object_Localization_2015_CVPR_paper.pdf

# Cutout
DeVries, Terrance, and Graham W. Taylor. "Improved regularization of convolutional neural networks with cutout." arXiv preprint arXiv:1708.04552 (2017).
https://arxiv.org/pdf/1708.04552.pdf

# DropBlock
Ghiasi, Golnaz, Tsung-Yi Lin, and Quoc V. Le. "Dropblock: A regularization method for convolutional networks." Advances in Neural Information Processing Systems. 2018.
http://papers.nips.cc/paper/8271-dropblock-a-regularization-method-for-convolutional-networks.pdf

-----

# Fast Dropout
Bayer, Justin, et al. "On fast dropout and its applicability to recurrent networks." arXiv preprint arXiv:1311.0701 (2013).
https://arxiv.org/pdf/1311.0701.pdf

# RNN Regularization
Zaremba, Wojciech, Ilya Sutskever, and Oriol Vinyals. "Recurrent neural network regularization." arXiv preprint arXiv:1409.2329 (2014).
https://arxiv.org/pdf/1409.2329.pdf

# Variational Dropout
Kingma, Durk P., Tim Salimans, and Max Welling. "Variational dropout and the local reparameterization trick." Advances in Neural Information Processing Systems. 2015.
https://papers.nips.cc/paper/5666-variational-dropout-and-the-local-reparameterization-trick.pdf

# Information Dropout
Achille, Alessandro, and Stefano Soatto. "Information dropout: Learning optimal representations through noisy computation." IEEE transactions on pattern analysis and machine intelligence 40.12 (2018): 2897-2905.
http://www.vision.jhu.edu/teaching/learning/deeplearning18/assets/Achille_Soatto-18.pdf

# rnnDrop
Moon, Taesup, et al. "Rnndrop: A novel dropout for rnns in asr." 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). IEEE, 2015.
http://mind.skku.edu/files/Conference/asru2015.pdf

# DropEmbbeding
Gal, Yarin, and Zoubin Ghahramani. "A theoretically grounded application of dropout in recurrent neural networks." Advances in neural information processing systems. 2016.
https://papers.nips.cc/paper/6241-a-theoretically-grounded-application-of-dropout-in-recurrent-neural-networks.pdf
 
# Recurrent Dropout
Semeniuta, Stanislau, Aliaksei Severyn, and Erhardt Barth. "Recurrent dropout without memory loss." arXiv preprint arXiv:1603.05118 (2016).
https://arxiv.org/pdf/1603.05118.pdf

# Zoneout
Krueger, David, et al. "Zoneout: Regularizing rnns by randomly preserving hidden activations." arXiv preprint arXiv:1606.01305 (2016).
https://arxiv.org/pdf/1606.01305.pdf 

# AWD-LSTM
Merity, Stephen, Nitish Shirish Keskar, and Richard Socher. "Regularizing and optimizing LSTM language models." arXiv preprint arXiv:1708.02182 (2017).
https://arxiv.org/pdf/1708.02182.pdf

-----

# DropAttention
Zehui, Lin, et al. "DropAttention: A Regularization Method for Fully-Connected Self-Attention Networks." arXiv preprint arXiv:1907.11065 (2019).
https://arxiv.org/pdf/1907.11065.pdf

-----

# Pairing Samples
Inoue, Hiroshi. "Data augmentation by pairing samples for images classification." arXiv preprint arXiv:1801.02929 (2018).
https://arxiv.org/pdf/1801.02929.pdf

# Mixup
Zhang, Hongyi, et al. "mixup: Beyond empirical risk minimization." arXiv preprint arXiv:1710.09412 (2017).
https://arxiv.org/pdf/1710.09412.pdf

-----

Normalization 

-----

# BN
Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." International conference on machine learning. 2015.
http://proceedings.mlr.press/v37/ioffe15.pdf

# WN
Salimans, Tim, and Durk P. Kingma. "Weight normalization: A simple reparameterization to accelerate training of deep neural networks." Advances in Neural Information Processing Systems. 2016.
https://papers.nips.cc/paper/6114-weight-normalization-a-simple-reparameterization-to-accelerate-training-of-deep-neural-networks.pdf
 
# LN
Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. "Layer normalization." arXiv preprint arXiv:1607.06450 (2016).
https://arxiv.org/pdf/1607.06450.pdf

# IN
Ulyanov, Dmitry, Andrea Vedaldi, and Victor Lempitsky. "Instance normalization: The missing ingredient for fast stylization." arXiv preprint arXiv:1607.08022 (2016).
https://arxiv.org/pdf/1607.08022.pdf 

# AIN
Huang, Xun, and Serge Belongie. "Arbitrary style transfer in real-time with adaptive instance normalization." Proceedings of the IEEE International Conference on Computer Vision. 2017.
http://openaccess.thecvf.com/content_ICCV_2017/papers/Huang_Arbitrary_Style_Transfer_ICCV_2017_paper.pdf

# GN
Wu, Yuxin, and Kaiming He. "Group normalization." Proceedings of the European Conference on Computer Vision (ECCV). 2018.
http://openaccess.thecvf.com/content_ECCV_2018/papers/Yuxin_Wu_Group_Normalization_ECCV_2018_paper.pdf

# PN
Li, Boyi, et al. "Positional Normalization." Advances in Neural Information Processing Systems. 2019.
http://papers.nips.cc/paper/8440-positional-normalization.pdf 

# UBN
Bjorck, Nils, et al. "Understanding batch normalization." Advances in Neural Information Processing Systems. 2018.
http://papers.nips.cc/paper/7996-understanding-batch-normalization.pdf

# TUBN
Kohler, Jonas, et al. "Towards a theoretical understanding of batch normalization." stat 1050 (2018): 27.
https://arxiv.org/pdf/1805.10694.pdf

# BNHO
Santurkar, Shibani, et al. "How does batch normalization help optimization?." Advances in Neural Information Processing Systems. 2018.
https://papers.nips.cc/paper/7515-how-does-batch-normalization-help-optimization.pdf

# URBN
Luo, Ping, et al. "Understanding regularization in batch normalization." arXiv preprint arXiv:1809.00846 (2018).
https://arxiv.org/pdf/1809.00846.pdf

# NormProp
Arpit, Devansh, et al. "Normalization propagation: A parametric technique for removing internal covariate shift in deep networks." arXiv preprint arXiv:1603.01431 (2016).
https://arxiv.org/pdf/1603.01431.pdf

# Efficient Backprop
LeCun, Yann A., et al. "Efficient backprop." Neural networks: Tricks of the trade. Springer, Berlin, Heidelberg, 2012. 9-48.
http://cseweb.ucsd.edu/classes/wi08/cse253/Handouts/lecun-98b.pdf

# Whitening
Kessy, Agnan, Alex Lewin, and Korbinian Strimmer. "Optimal whitening and decorrelation." The American Statistician 72.4 (2018): 309-314.
https://arxiv.org/pdf/1512.00809.pdf

# CAT
Zuber, Verena, and Korbinian Strimmer. "Gene ranking and biomarker discovery under correlation." Bioinformatics 25.20 (2009): 2700-2707.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.247.8982&rep=rep1&type=pdf

# CAR
Zuber, Verena, and Korbinian Strimmer. "High-dimensional regression and variable selection using CAR scores." Statistical Applications in Genetics and Molecular Biology 10.1 (2011).
https://arxiv.org/pdf/1007.5516.pdf

# GWNN
Luo, Ping. "Learning deep architectures via generalized whitened neural networks." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017.
http://proceedings.mlr.press/v70/luo17a/luo17a.pdf

# DBN
Huang, Lei, et al. "Decorrelated batch normalization." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
http://openaccess.thecvf.com/content_cvpr_2018/papers/Huang_Decorrelated_Batch_Normalization_CVPR_2018_paper.pdf

# KN
Wang, Guangrun, et al. "Kalman normalization: Normalizing internal representations across network layers." Advances in Neural Information Processing Systems. 2018.
https://papers.nips.cc/paper/7288-kalman-normalization-normalizing-internal-representations-across-network-layers.pdf

# IterNorm
Huang, Lei, et al. "Iterative Normalization: Beyond Standardization towards Efficient Whitening." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
http://openaccess.thecvf.com/content_CVPR_2019/papers/Huang_Iterative_Normalization_Beyond_Standardization_Towards_Efficient_Whitening_CVPR_2019_paper.pdf


-----

Optimization

-----

# SGD
Bottou, Léon. "Stochastic gradient descent tricks." Neural networks: Tricks of the trade. Springer, Berlin, Heidelberg, 2012. 421-436.
https://www.microsoft.com/en-us/research/wp-content/uploads/2012/01/tricks-2012.pdf 

# Momentum
Sutskever, Ilya, et al. "On the importance of initialization and momentum in deep learning." International conference on machine learning. 2013.
http://proceedings.mlr.press/v28/sutskever13.pdf

# NAG
Nesterov, Y. "A method of solving a convex programming problem with convergence rate $$ O (\frac {1}{k^ 2}) $$ O (1k2)." Soviet Math. Dokl. Vol. 27.
http://mpawankumar.info/teaching/cdt-big-data/nesterov83.pdf
 
# AdaGrad
Duchi, John, Elad Hazan, and Yoram Singer. "Adaptive subgradient methods for online learning and stochastic optimization." Journal of Machine Learning Research 12.Jul (2011): 2121-2159.
http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf
  
# AdaDelta
Zeiler, Matthew D. "ADADELTA: an adaptive learning rate method." arXiv preprint arXiv:1212.5701 (2012).
https://arxiv.org/pdf/1212.5701.pdf

# RMSProp
Tieleman, Tijmen, and Geoffrey Hinton. "Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude." COURSERA: Neural networks for machine learning 4.2 (2012): 26-31.
http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf

# Adam
Kingma, Diederik P., and Jimmy Ba. "Adam: A method for stochastic optimization." arXiv preprint arXiv:1412.6980 (2014).
https://arxiv.org/pdf/1412.6980.pdf

# From Adam to SGD
Keskar, Nitish Shirish, and Richard Socher. "Improving generalization performance by switching from adam to sgd." arXiv preprint arXiv:1712.07628 (2017).
https://arxiv.org/pdf/1712.07628.pdf

# Nadam
Dozat, Timothy. "Incorporating nesterov momentum into adam." (2016).
https://openreview.net/pdf?id=OM0jvwB8jIp57ZJjtNEZ

# AMSGrad
Reddi, Sashank J., Satyen Kale, and Sanjiv Kumar. "On the convergence of adam and beyond." International Conference on Learning Representations. 2018.
http://www.satyenkale.com/papers/amsgrad.pdf 

# RAdam
Liu, Liyuan, et al. "On the variance of the adaptive learning rate and beyond." arXiv preprint arXiv:1908.03265 (2019).
https://arxiv.org/pdf/1908.03265.pdf

# SMA
Nau, Robert. "Forecasting with moving averages." Fuqua School of Business, Duke University (2014): 1-3.
https://people.duke.edu/~rnau/Notes_on_forecasting_with_moving_averages--Robert_Nau.pdf

# Lookahead
Zhang, Michael, et al. "Lookahead Optimizer: k steps forward, 1 step back." Advances in Neural Information Processing Systems. 2019.
http://papers.nips.cc/paper/9155-lookahead-optimizer-k-steps-forward-1-step-back.pdf

# EMA
Hunter, J. Stuart. "The exponentially weighted moving average." Journal of quality technology 18.4 (1986): 203-210.
https://www.researchgate.net/profile/Arumugam_Raman/post/What_kind_of_data_is_usually_considered_in_Construction_of_Shewhart_control_charts/attachment/59d6255579197b8077983a73/AS%3A273836358995969%401442299083585/download/L11-OnEWMA.pdf

# LAMB
You, Yang, et al. "Reducing BERT Pre-Training Time from 3 Days to 76 Minutes." arXiv preprint arXiv:1904.00962 (2019).
https://arxiv.org/pdf/1904.00962.pdf

# CLR
Smith, Leslie N. "Cyclical learning rates for training neural networks." 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2017.
https://arxiv.org/pdf/1506.01186.pdf

# SGDR
Loshchilov, Ilya, and Frank Hutter. "Sgdr: Stochastic gradient descent with warm restarts." arXiv preprint arXiv:1608.03983 (2016).
https://arxiv.org/pdf/1608.03983.pdf

# AdamW
Loshchilov, Ilya, and Frank Hutter. "Decoupled weight decay regularization (2019)." arXiv preprint arXiv:1711.05101.
https://arxiv.org/pdf/1711.05101.pdf

# Super-Convergence
Smith, Leslie N., and Nicholay Topin. "Super-convergence: Very fast training of residual networks using large learning rates." (2018).
https://openreview.net/pdf?id=H1A5ztj3b

# ADMM
Boyd, Stephen, et al. "Distributed optimization and statistical learning via the alternating direction method of multipliers." Foundations and Trends® in Machine learning 3.1 (2011): 1-122.
https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.360.1664&rep=rep1&type=pdf

# ADMM-S
Taylor, Gavin, et al. "Training neural networks without gradients: A scalable admm approach." International conference on machine learning. 2016.
http://proceedings.mlr.press/v48/taylor16.pdf

# dlADMM
Wang, Junxiang, et al. "ADMM for Efficient Deep Learning with Global Convergence." arXiv preprint arXiv:1905.13611 (2019).
https://arxiv.org/pdf/1905.13611.pdf
 
-----

Activation Function

-----

# Activation Function
Nwankpa, Chigozie, et al. "Activation functions: Comparison of trends in practice and research for deep learning." arXiv preprint arXiv:1811.03378 (2018).
https://arxiv.org/pdf/1811.03378.pdf

# ReLU 2000


# ReLU 2009


# Softplus


# LReLU


# PReLU


# ELU


# SELU


# GELU
 

# Swish
Ramachandran, Prajit, Barret Zoph, and Quoc V. Le. "Searching for activation functions." arXiv preprint arXiv:1710.05941 (2017).
https://arxiv.org/pdf/1710.05941.pdf

# Maxout
Goodfellow, Ian J., et al. "Maxout networks." arXiv preprint arXiv:1302.4389 (2013).
http://proceedings.mlr.press/v28/goodfellow13.pdf

-----

Loss Function

-----

# Loss Function
Barron, Jonathan T. "A general and adaptive robust loss function." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
http://openaccess.thecvf.com/content_CVPR_2019/papers/Barron_A_General_and_Adaptive_Robust_Loss_Function_CVPR_2019_paper.pdf

-----

Pooling

-----

-----

Convolution

-----

-----

Automatic Differentiation

-----

-----

Back Propagation

-----


-----

Back Propagation

-----

# Back Propagation
Alber, Maximilian, et al. "Backprop evolution." arXiv preprint arXiv:1808.02822 (2018).
https://arxiv.org/pdf/1808.02822.pdf

-----

-----

Computational Graph

-----

References

# 參考書籍
[1] 書名:深度學習: Caffe 之經典模型詳解與實戰,ISBN:7121301180,作者:樂毅,出版社:電子工業,出版日期:2016-09-30.

# LSTM 論文的輔助教材
[2] Understanding LSTM Networks -- colah's blog
http://colah.github.io/posts/2015-08-Understanding-LSTMs/ 

# 三篇快速版(2017春)
[3] Deep Learning Paper
http://hemingwang.blogspot.com/2019/01/deep-learning-paper.html

# 二十篇慢速版(2018春)
[4] PyTorch(六):Seminar

# 三十篇基礎版(2019春)
[5] 30 Topics for Deep Learning
http://hemingwang.blogspot.com/2019/04/30-topics-for-deep-learning.html  

# 十篇精華版(2019夏)
[6] AI 三部曲(深度學習:從入門到精通)
https://hemingwang.blogspot.com/2019/05/trilogy.html

# 五十篇完整版(2019秋)
[7] AI從頭學(三九):Complete Works
http://hemingwang.blogspot.tw/2017/08/aicomplete-works.html