The Star Also Rises: Deep Learning Highlight

Deep Learning Highlight

2019/04/25

說明：

這是依照我自學深度學習進度推出的入門建議。

分別有：三篇快速版，可以「快速」一窺深度學習全貌。二十篇慢速版，以電腦視覺為主。三十篇基礎版，新增基礎主題與自然語言處理。十篇精華版，為三十篇基礎版的精華。五十篇完整版，涵蓋上述所有提到的論文。

-----

Fig. 1. 深度學習: Caffe 之經典模型詳解與實戰 [1]。

-----

三篇快速版，給想對深度學習有基本認識的一般大眾。

1. Deep Learning
2. LeNet
3. LSTM

-----

二十篇慢速版，PyTorch Taipei 與 PyTorch Hsinchu 的課表。

1. LeNet
2. LSTM

3. AlexNet
4. ZFNet
5. NIN
6. GoogLeNet
7. VGGNet
8. SqueezeNet

9. PreVGGNet
10. SVM
11. SMO
12. DPM
13. SS
14. FCN

15. R-CNN
16. SPPNet
17. Fast R-CNN
18. Faster R-CNN
19. YOLO
20. SSD

-----

三十篇基礎版，PyTorch New Taipei 的課表。

1. LeNet（AlexNet、ZFNet）
2. NIN + SENet（GoogLeNet、VGGNet、PreVGGNet、Highway）
3. ResNet
4. FCN（Mask R-CNN、YOLACT）
5. YOLOv1（Faster R-CNN、YOLOv3）

6. LSTM（Weight Decay、Dropout）
7. Seq2seq（Batch Normalization、Layer Normalization）
8. Attention（RAdam、Lookahead）
9. ConvS2S（ULMFiT、ELMo）
10. Transformer（GPT-1、BERT）

-----

十篇精華版，收費論文研討的課表。

1. LeNet
2. NIN
3. ResNet
4. FCN
5. YOLOv1

6. LSTM
7. Seq2seq
8. Attention
9. ConvS2S
10. Transformer

-----

四十八篇完整版，全方位 AI 課程的課表。

1. LeNet。AlexNet、ZFNet。
2. NIN、SENet。GoogLeNet、VGGNet、PreVGGNet、Highway v1 v2、Inception v3 v4。
3. ResNet v1、ResNet-D、ResNet v2、ResNet-V。ResNeXt、DenseNet。
4. FCN。Faster R-CNN、Mask R-CNN。
5. YOLOv1。SSD、FPN、YOLOv3、YOLOv4。

6. LSTM。NNLM、C&W、Word2vec。Weight Decay、Dropout。
7. Seq2seq。Paragraph2vec。Batch Normalization、Layer Normalization。
8. Attention。FSH。Adam、Ranger。
9. ConvS2S。Context2vec、ELMo。
10. Transformer。GPT-1、GPT-2、GPT-3、BERT。

-----

符號說明：

# basic
// advanced

-----

Paper

# Deep Learning

LeCun, Yann, Yoshua Bengio, and Geoffrey Hinton. "Deep learning." nature 521.7553 (2015): 436.

https://creativecoding.soe.ucsc.edu/courses/cs523/slides/week3/DeepLearning_LeCun.pdf

// History of Deep Learning

Alom, Md Zahangir, et al. "The history began from alexnet: A comprehensive survey on deep learning approaches." arXiv preprint arXiv:1803.01164 (2018).

https://arxiv.org/ftp/arxiv/papers/1803/1803.01164.pdf

// Recent Advances in CNN

Gu, Jiuxiang, et al. "Recent advances in convolutional neural networks." Pattern Recognition 77 (2018): 354-377.

https://arxiv.org/pdf/1512.07108.pdf

// GPU

Raina, Rajat, Anand Madhavan, and Andrew Y. Ng. "Large-scale deep unsupervised learning using graphics processors." Proceedings of the 26th annual international conference on machine learning. ACM, 2009.

http://robotics.stanford.edu/~ang/papers/icml09-LargeScaleUnsupervisedDeepLearningGPU.pdf

// Difficult 1994

Bengio, Yoshua, Patrice Simard, and Paolo Frasconi. "Learning long-term dependencies with gradient descent is difficult." IEEE transactions on neural networks 5.2 (1994): 157-166.

https://pdfs.semanticscholar.org/d0be/39ee052d246ae99c082a565aba25b811be2d.pdf

// Difficult 2010

Glorot, Xavier, and Yoshua Bengio. "Understanding the difficulty of training deep feedforward neural networks." Proceedings of the thirteenth international conference on artificial intelligence and statistics. 2010.

http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf

// Difficult 2013

Pascanu, Razvan, Tomas Mikolov, and Yoshua Bengio. "On the difficulty of training recurrent neural networks." International conference on machine learning. 2013.

http://proceedings.mlr.press/v28/pascanu13.pdf

-----

Part I：Computer Vision

-----

◎ Image Classicification

-----

# LeNet

LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.

http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf

# AlexNet
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.

http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf

# ZFNet
Zeiler, Matthew D., and Rob Fergus. "Visualizing and understanding convolutional networks." European conference on computer vision. Springer, Cham, 2014.

https://arxiv.org/pdf/1311.2901.pdf

// PreAlexNet

// PreZFNet

// Deconv

-----

# NIN
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." arXiv preprint arXiv:1312.4400 (2013).
https://arxiv.org/pdf/1312.4400.pdf

# SENet

Hu, Jie, Li Shen, and Gang Sun. "Squeeze-and-excitation networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.

http://openaccess.thecvf.com/content_cvpr_2018/papers/Hu_Squeeze-and-Excitation_Networks_CVPR_2018_paper.pdf

// SKNet

// STNet

// RANet

// BAM

// CBAM

// RASNet

-----

# GoogLeNet
Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
http://openaccess.thecvf.com/content_cvpr_2015/papers/Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf

# VGGNet
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
https://arxiv.org/pdf/1409.1556/

# PreVGGNet

Ciresan, Dan C., et al. "Flexible, high performance convolutional neural networks for image classification." IJCAI Proceedings-International Joint Conference on Artificial Intelligence. Vol. 22. No. 1. 2011.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.481.4406&rep=rep1&type=pdf

# Highway v1

Srivastava, Rupesh Kumar, Klaus Greff, and Jürgen Schmidhuber. "Highway networks." arXiv preprint arXiv:1505.00387 (2015).

https://arxiv.org/pdf/1505.00387.pdf

# Highway v2

Srivastava, Rupesh K., Klaus Greff, and Jürgen Schmidhuber. "Training very deep networks." Advances in neural information processing systems. 2015.

https://papers.nips.cc/paper/5850-training-very-deep-networks.pdf

# Inception v3

Szegedy, Christian, et al. "Rethinking the inception architecture for computer vision." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.

https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Szegedy_Rethinking_the_Inception_CVPR_2016_paper.pdf

# Inception v4

Szegedy, Christian, et al. "Inception-v4, inception-resnet and the impact of residual connections on learning." AAAI. Vol. 4. 2017.

http://www.aaai.org/ocs/index.php/AAAI/AAAI17/paper/download/14806/14311

// CapsNet v0

// CapsNet v1

Sabour, Sara, Nicholas Frosst, and Geoffrey E. Hinton. "Dynamic routing between capsules." Advances in neural information processing systems. 2017.

http://papers.nips.cc/paper/6975-dynamic-routing-between-capsules.pdf

# CapsNet v2

# CapsNet v3

-----

# ResNet v1

He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

http://openaccess.thecvf.com/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf

# ResNet-D

Huang, Gao, et al. "Deep networks with stochastic depth." European conference on computer vision. Springer, Cham, 2016.

https://arxiv.org/pdf/1603.09382.pdf

# ResNet v2

He, Kaiming, et al. "Identity mappings in deep residual networks." European Conference on Computer Vision. Springer, Cham, 2016.

https://arxiv.org/pdf/1603.05027.pdf

# ResNet-E

Veit, Andreas, Michael J. Wilber, and Serge Belongie. "Residual networks behave like ensembles of relatively shallow networks." Advances in neural information processing systems. 2016.

https://papers.nips.cc/paper/6556-residual-networks-behave-like-ensembles-of-relatively-shallow-networks.pdf

# ResNet-V

Li, Hao, et al. "Visualizing the loss landscape of neural nets." Advances in Neural Information Processing Systems. 2018.

https://papers.nips.cc/paper/7875-visualizing-the-loss-landscape-of-neural-nets.pdf

-----

// ResNet-F

Zhang, Hongyi, Yann N. Dauphin, and Tengyu Ma. "Fixup initialization: Residual learning without normalization." arXiv preprint arXiv:1901.09321 (2019).

https://arxiv.org/pdf/1901.09321.pdf

// ResNet-I

Shwartz-Ziv, Ravid, and Naftali Tishby. "Opening the black box of deep neural networks via information." arXiv preprint arXiv:1703.00810 (2017).

https://arxiv.org/pdf/1703.00810.pdf

// ResNet-Q

Balduzzi, David, et al. "The shattered gradients problem: If resnets are the answer, then what is the question?." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017.

https://arxiv.org/pdf/1702.08591.pdf

// ResNet-S

Orhan, A. Emin, and Xaq Pitkow. "Skip connections eliminate singularities." arXiv preprint arXiv:1701.09175 (2017).

https://arxiv.org/pdf/1701.09175.pdf

// ResNet-U

Liu, Tianyi, et al. "Towards Understanding the Importance of Shortcut Connections in Residual Networks." Advances in Neural Information Processing Systems. 2019.

http://papers.nips.cc/paper/9003-towards-understanding-the-importance-of-shortcut-connections-in-residual-networks.pdf

// ResNet-W

He, Fengxiang, Tongliang Liu, and Dacheng Tao. "Why resnet works? residuals generalize." arXiv preprint arXiv:1904.01367 (2019).

https://arxiv.org/pdf/1904.01367.pdf

// WRN

Zagoruyko, Sergey, and Nikos Komodakis. "Wide residual networks." arXiv preprint arXiv:1605.07146 (2016).

https://arxiv.org/pdf/1605.07146.pdf

# ResNeXt

Xie, Saining, et al. "Aggregated residual transformations for deep neural networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

http://openaccess.thecvf.com/content_cvpr_2017/papers/Xie_Aggregated_Residual_Transformations_CVPR_2017_paper.pdf

# DenseNet

Huang, Gao, et al. "Densely connected convolutional networks." Proceedings of the IEEE conference on computer vision and pattern recognition. Vol. 1. No. 2. 2017.

http://openaccess.thecvf.com/content_cvpr_2017/papers/Huang_Densely_Connected_Convolutional_CVPR_2017_paper.pdf

// DPN

Chen, Yunpeng, et al. "Dual path networks." Advances in Neural Information Processing Systems. 2017.

https://papers.nips.cc/paper/7033-dual-path-networks.pdf

// DLA

Yu, Fisher, et al. "Deep layer aggregation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.

http://openaccess.thecvf.com/content_cvpr_2018/papers/Yu_Deep_Layer_Aggregation_CVPR_2018_paper.pdf

// Res2Net

Gao, Shang-Hua, et al. "Res2Net: A New Multi-scale Backbone Architecture." arXiv preprint arXiv:1904.01169 (2019).

https://arxiv.org/pdf/1904.01169.pdf

// PolyNet

Zhang, Xingcheng, et al. "Polynet: A pursuit of structural diversity in very deep networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.

http://openaccess.thecvf.com/content_cvpr_2017/papers/Zhang_PolyNet_A_Pursuit_CVPR_2017_paper.pdf

// FractalNet（DropPath）

Larsson, Gustav, Michael Maire, and Gregory Shakhnarovich. "Fractalnet: Ultra-deep neural networks without residuals." arXiv preprint arXiv:1605.07648 (2016).

https://arxiv.org/pdf/1605.07648.pdf

-----

◎ Mobile

-----

# SqueezeNet
Iandola, Forrest N., et al. "SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size." arXiv preprint arXiv:1602.07360 (2016).
https://arxiv.org/pdf/1602.07360.pdf

# MobileNet v1

Howard, Andrew G., et al. "Mobilenets: Efficient convolutional neural networks for mobile vision applications." arXiv preprint arXiv:1704.04861 (2017).

https://arxiv.org/pdf/1704.04861.pdf

# MobileNet v2

Sandler, Mark, et al. "Mobilenetv2: Inverted residuals and linear bottlenecks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.

http://openaccess.thecvf.com/content_cvpr_2018/papers/Sandler_MobileNetV2_Inverted_Residuals_CVPR_2018_paper.pdf

# MobileNet v3

Howard, Andrew, et al. "Searching for mobilenetv3." arXiv preprint arXiv:1905.02244 (2019).

https://arxiv.org/pdf/1905.02244.pdf

# ShuffleNet v1

Zhang, Xiangyu, et al. "Shufflenet: An extremely efficient convolutional neural network for mobile devices." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.

http://openaccess.thecvf.com/content_cvpr_2018/papers/Zhang_ShuffleNet_An_Extremely_CVPR_2018_paper.pdf

# ShuffleNet v2

Ma, Ningning, et al. "Shufflenet v2: Practical guidelines for efficient cnn architecture design." Proceedings of the European Conference on Computer Vision (ECCV). 2018.

http://openaccess.thecvf.com/content_ECCV_2018/papers/Ningning_Light-weight_CNN_Architecture_ECCV_2018_paper.pdf

# Xception

Chollet, François. "Xception: Deep learning with depthwise separable convolutions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

http://openaccess.thecvf.com/content_cvpr_2017/papers/Chollet_Xception_Deep_Learning_CVPR_2017_paper.pdf

# ESPNet v1

Mehta, Sachin, et al. "Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation." Proceedings of the European Conference on Computer Vision (ECCV). 2018.

http://openaccess.thecvf.com/content_ECCV_2018/papers/Sachin_Mehta_ESPNet_Efficient_Spatial_ECCV_2018_paper.pdf

# ESPNet v2

Mehta, Sachin, et al. "Espnetv2: A light-weight, power efficient, and general purpose convolutional neural network." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.

http://openaccess.thecvf.com/content_CVPR_2019/papers/Mehta_ESPNetv2_A_Light-Weight_Power_Efficient_and_General_Purpose_Convolutional_Neural_CVPR_2019_paper.pdf

-----

# NAS-RL

Zoph, Barret, and Quoc V. Le. "Neural architecture search with reinforcement learning." arXiv preprint arXiv:1611.01578 (2016).

https://arxiv.org/pdf/1611.01578.pdf

# NASNet（Scheduled DropPath）

Zoph, Barret, et al. "Learning transferable architectures for scalable image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.

http://openaccess.thecvf.com/content_cvpr_2018/papers/Zoph_Learning_Transferable_Architectures_CVPR_2018_paper.pdf

// pNASNet

Liu, Chenxi, et al. "Progressive neural architecture search." Proceedings of the European Conference on Computer Vision (ECCV). 2018.

http://openaccess.thecvf.com/content_ECCV_2018/papers/Chenxi_Liu_Progressive_Neural_Architecture_ECCV_2018_paper.pdf

// AmoebaNet

Real, Esteban, et al. "Regularized evolution for image classifier architecture search." Proceedings of the aaai conference on artificial intelligence. Vol. 33. 2019.

https://arxiv.org/pdf/1802.01548.pdf

// mNASNet

Tan, Mingxing, et al. "Mnasnet: Platform-aware neural architecture search for mobile." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.

http://openaccess.thecvf.com/content_CVPR_2019/papers/Tan_MnasNet_Platform-Aware_Neural_Architecture_Search_for_Mobile_CVPR_2019_paper.pdf

# Auto-DeepLab

Liu, Chenxi, et al. "Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.

http://openaccess.thecvf.com/content_CVPR_2019/papers/Liu_Auto-DeepLab_Hierarchical_Neural_Architecture_Search_for_Semantic_Image_Segmentation_CVPR_2019_paper.pdf

# NAS-FPN

Ghiasi, Golnaz, Tsung-Yi Lin, and Quoc V. Le. "Nas-fpn: Learning scalable feature pyramid architecture for object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.

http://openaccess.thecvf.com/content_CVPR_2019/papers/Ghiasi_NAS-FPN_Learning_Scalable_Feature_Pyramid_Architecture_for_Object_Detection_CVPR_2019_paper.pdf

# AutoAugment

Cubuk, Ekin D., et al. "Autoaugment: Learning augmentation policies from data." arXiv preprint arXiv:1805.09501 (2018).

https://arxiv.org/pdf/1805.09501.pdf

# EfficientNet

Tan, Mingxing, and Quoc V. Le. "EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks." arXiv preprint arXiv:1905.11946 (2019).

https://arxiv.org/pdf/1905.11946.pdf

# EffcientDet

Tan, Mingxing, Ruoming Pang, and Quoc V. Le. "Efficientdet: Scalable and efficient object detection." arXiv preprint arXiv:1911.09070 (2019).

https://arxiv.org/pdf/1911.09070.pdf

-----

◎ Semantic Segmentation

-----

// SDS

Hariharan, Bharath, et al. "Simultaneous detection and segmentation." European Conference on Computer Vision. Springer, Cham, 2014.

https://arxiv.org/pdf/1407.1808.pdf

# FCN

Long, Jonathan, Evan Shelhamer, and Trevor Darrell. "Fully convolutional networks for semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.

https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Long_Fully_Convolutional_Networks_2015_CVPR_paper.pdf

# DeconvNet

Noh, Hyeonwoo, Seunghoon Hong, and Bohyung Han. "Learning deconvolution network for semantic segmentation." Proceedings of the IEEE international conference on computer vision. 2015.

https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/Noh_Learning_Deconvolution_Network_ICCV_2015_paper.pdf

# SegNet

Badrinarayanan, Vijay, Alex Kendall, and Roberto Cipolla. "Segnet: A deep convolutional encoder-decoder architecture for image segmentation." IEEE transactions on pattern analysis and machine intelligence 39.12 (2017): 2481-2495.

https://arxiv.org/pdf/1511.00561.pdf

# U-Net

Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015.

https://arxiv.org/pdf/1505.04597.pdf

# U-Net++

Zhou, Zongwei, et al. "Unet++: A nested u-net architecture for medical image segmentation." Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. Springer, Cham, 2018. 3-11.

https://arxiv.org/pdf/1807.10165.pdf

-----

# DilatedNet

Yu, Fisher, and Vladlen Koltun. "Multi-scale context aggregation by dilated convolutions." arXiv preprint arXiv:1511.07122 (2015).

https://arxiv.org/pdf/1511.07122.pdf

# ENet

Paszke, Adam, et al. "Enet: A deep neural network architecture for real-time semantic segmentation." arXiv preprint arXiv:1606.02147 (2016).

https://arxiv.org/pdf/1606.02147.pdf

# DRN

Yu, Fisher, Vladlen Koltun, and Thomas Funkhouser. "Dilated residual networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

http://openaccess.thecvf.com/content_cvpr_2017/papers/Yu_Dilated_Residual_Networks_CVPR_2017_paper.pdf

# FastFCN

Wu, Huikai, et al. "FastFCN: Rethinking Dilated Convolution in the Backbone for Semantic Segmentation." arXiv preprint arXiv:1903.11816 (2019).

https://arxiv.org/pdf/1903.11816.pdf

-----

# FC-CRF

Krähenbühl, Philipp, and Vladlen Koltun. "Efficient inference in fully connected crfs with gaussian edge potentials." Advances in neural information processing systems. 2011.

http://papers.nips.cc/paper/4296-efficient-inference-in-fully-connected-crfs-with-gaussian-edge-potentials.pdf

# DeepLab v1

Chen, Liang-Chieh, et al. "Semantic image segmentation with deep convolutional nets and fully connected crfs." arXiv preprint arXiv:1412.7062 (2014).

https://arxiv.org/pdf/1412.7062.pdf

# DeepLab v2

Chen, Liang-Chieh, et al. "Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs." arXiv preprint arXiv:1606.00915 (2016).

https://arxiv.org/pdf/1606.00915.pdf

# DeepLab v3

Chen, Liang-Chieh, et al. "Rethinking atrous convolution for semantic image segmentation." arXiv preprint arXiv:1706.05587 (2017).

https://arxiv.org/pdf/1706.05587.pdf

# DeepLab v3+

Chen, Liang-Chieh, et al. "Encoder-decoder with atrous separable convolution for semantic image segmentation." Proceedings of the European conference on computer vision (ECCV). 2018.

http://openaccess.thecvf.com/content_ECCV_2018/papers/Liang-Chieh_Chen_Encoder-Decoder_with_Atrous_ECCV_2018_paper.pdf

# Gated-SCNN

Takikawa, Towaki, et al. "Gated-scnn: Gated shape cnns for semantic segmentation." Proceedings of the IEEE International Conference on Computer Vision. 2019.

http://openaccess.thecvf.com/content_ICCV_2019/papers/Takikawa_Gated-SCNN_Gated_Shape_CNNs_for_Semantic_Segmentation_ICCV_2019_paper.pdf

-----

# ResNet-38

Wu, Zifeng, Chunhua Shen, and Anton Van Den Hengel. "Wider or deeper: Revisiting the resnet model for visual recognition." Pattern Recognition 90 (2019): 119-133.

https://arxiv.org/pdf/1611.10080.pdf

# Tiramisu

Jégou, Simon, et al. "The one hundred layers tiramisu: Fully convolutional densenets for semantic segmentation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2017.

http://openaccess.thecvf.com/content_cvpr_2017_workshops/w13/papers/Jegou_The_One_Hundred_CVPR_2017_paper.pdf

# RefineNet

Lin, Guosheng, et al. "Refinenet: Multi-path refinement networks for high-resolution semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

http://openaccess.thecvf.com/content_cvpr_2017/papers/Lin_RefineNet_Multi-Path_Refinement_CVPR_2017_paper.pdf

# RefineNet-LW

Nekrasov, Vladimir, Chunhua Shen, and Ian Reid. "Light-weight refinenet for real-time semantic segmentation." arXiv preprint arXiv:1810.03272 (2018).

https://arxiv.org/pdf/1810.03272.pdf

# RefineNet-AA

Nekrasov, Vladimir, et al. "Real-time joint semantic segmentation and depth estimation using asymmetric annotations." 2019 International Conference on Robotics and Automation (ICRA). IEEE, 2019.

https://arxiv.org/pdf/1809.04766.pdf

# VPLR

Zhu, Yi, et al. "Improving Semantic Segmentation via Video Propagation and Label Relaxation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.

http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhu_Improving_Semantic_Segmentation_via_Video_Propagation_and_Label_Relaxation_CVPR_2019_paper.pdf

# PSPNet

Zhao, Hengshuang, et al. "Pyramid scene parsing network." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

http://openaccess.thecvf.com/content_cvpr_2017/papers/Zhao_Pyramid_Scene_Parsing_CVPR_2017_paper.pdf

# ICNet

Zhao, Hengshuang, et al. "Icnet for real-time semantic segmentation on high-resolution images." Proceedings of the European Conference on Computer Vision (ECCV). 2018.

http://openaccess.thecvf.com/content_ECCV_2018/papers/Hengshuang_Zhao_ICNet_for_Real-Time_ECCV_2018_paper.pdf

# BiSeNet

Yu, Changqian, et al. "Bisenet: Bilateral segmentation network for real-time semantic segmentation." Proceedings of the European Conference on Computer Vision (ECCV). 2018.

http://openaccess.thecvf.com/content_ECCV_2018/papers/Changqian_Yu_BiSeNet_Bilateral_Segmentation_ECCV_2018_paper.pdf

# Fast-SCNN

Poudel, Rudra PK, Stephan Liwicki, and Roberto Cipolla. "Fast-SCNN: fast semantic segmentation network." arXiv preprint arXiv:1902.04502 (2019).

https://arxiv.org/pdf/1902.04502.pdf

# BlitzNet

Dvornik, Nikita, et al. "Blitznet: A real-time deep network for scene understanding." Proceedings of the IEEE international conference on computer vision. 2017.

http://openaccess.thecvf.com/content_ICCV_2017/papers/Dvornik_BlitzNet_A_Real-Time_ICCV_2017_paper.pdf

// SA-GAN

// DANet

// OCNet

-----

◎ Instance Segmentation

-----

// MNC

Dai, Jifeng, Kaiming He, and Jian Sun. "Instance-aware semantic segmentation via multi-task network cascades." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.

https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Dai_Instance-Aware_Semantic_Segmentation_CVPR_2016_paper.pdf

// DeepMask

Pinheiro, Pedro O., Ronan Collobert, and Piotr Dollár. "Learning to segment object candidates." Advances in Neural Information Processing Systems. 2015.

https://papers.nips.cc/paper/5852-learning-to-segment-object-candidates.pdf

// SharpMask

Pinheiro, Pedro O., et al. "Learning to refine object segments." European Conference on Computer Vision. Springer, Cham, 2016.

https://arxiv.org/pdf/1603.08695.pdf

// MultiPathNet

Zagoruyko, Sergey, et al. "A multipath network for object detection." arXiv preprint arXiv:1604.02135 (2016).

https://arxiv.org/pdf/1604.02135.pdf

// InstanceFCN

Dai, Jifeng, et al. "Instance-sensitive fully convolutional networks." European Conference on Computer Vision. Springer, Cham, 2016.

https://arxiv.org/pdf/1603.08678.pdf

// FCIS

Li, Yi, et al. "Fully convolutional instance-aware semantic segmentation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.

http://openaccess.thecvf.com/content_cvpr_2017/papers/Li_Fully_Convolutional_Instance-Aware_CVPR_2017_paper.pdf

# Mask R-CNN

He, Kaiming, et al. "Mask r-cnn." Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 2017.

http://openaccess.thecvf.com/content_ICCV_2017/papers/He_Mask_R-CNN_ICCV_2017_paper.pdf

# YOLACT++

Bolya, Daniel, et al. "YOLACT++: Better Real-time Instance Segmentation." arXiv preprint arXiv:1912.06218 (2019).

https://arxiv.org/pdf/1912.06218.pdf

-----

◎ Object Detection

-----

// SVM

// SMO

Platt, John. "Sequential minimal optimization: A fast algorithm for training support vector machines." (1998).
https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-98-14.pdf

-----

// SIFT

// HOG

// DPM

-----

# DPM

Felzenszwalb, Pedro F., et al. "Object detection with discriminatively trained part-based models." IEEE transactions on pattern analysis and machine intelligence 32.9 (2010): 1627-1645.

https://ttic.uchicago.edu/~dmcallester/lsvm-pami.pdf

# SS

Uijlings, Jasper RR, et al. "Selective search for object recognition." International journal of computer vision 104.2 (2013): 154-171.
https://ivi.fnwi.uva.nl/isis/publications/2013/UijlingsIJCV2013/UijlingsIJCV2013.pdf

# R-CNN
Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2014.
https://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Girshick_Rich_Feature_Hierarchies_2014_CVPR_paper.pdf?spm=5176.100239.blogcont55892.8.pm8zm1&file=Girshick_Rich_Feature_Hierarchies_2014_CVPR_paper.pdf

# SPPNet

He, Kaiming, et al. "Spatial pyramid pooling in deep convolutional networks for visual recognition." european conference on computer vision. Springer, Cham, 2014.
https://arxiv.org/pdf/1406.4729.pdf

# Fast R-CNN
Girshick, Ross. "Fast R-CNN." Proceedings of the IEEE international conference on computer vision. 2015.
http://openaccess.thecvf.com/content_iccv_2015/papers/Girshick_Fast_R-CNN_ICCV_2015_paper.pdf

# Faster R-CNN
Ren, Shaoqing, et al. "Faster R-CNN: Towards real-time object detection with region proposal networks." Advances in neural information processing systems. 2015.
http://papers.nips.cc/paper/5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks.pdf

-----

# OverFeat

Sermanet, Pierre, et al. "Overfeat: Integrated recognition, localization and detection using convolutional networks." arXiv preprint arXiv:1312.6229 (2013).

https://arxiv.org/pdf/1312.6229.pdf

# YOLO v1
Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Redmon_You_Only_Look_CVPR_2016_paper.pdf

# SSD
Liu, Wei, et al. "SSD: Single shot multibox detector." European conference on computer vision. Springer, Cham, 2016.
https://arxiv.org/pdf/1512.02325.pdf

# DSSD

Fu, Cheng-Yang, et al. "Dssd: Deconvolutional single shot detector." arXiv preprint arXiv:1701.06659 (2017).

https://arxiv.org/pdf/1701.06659.pdf

# YOLO v2

Redmon, Joseph, and Ali Farhadi. "YOLO9000: better, faster, stronger." arXiv preprint (2017).

http://openaccess.thecvf.com/content_cvpr_2017/papers/Redmon_YOLO9000_Better_Faster_CVPR_2017_paper.pdf

-----

# ION

Bell, Sean, et al. "Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

http://openaccess.thecvf.com/content_cvpr_2016/papers/Bell_Inside-Outside_Net_Detecting_CVPR_2016_paper.pdf

# R-FCN

Dai, Jifeng, et al. "R-fcn: Object detection via region-based fully convolutional networks." Advances in neural information processing systems. 2016.

https://papers.nips.cc/paper/6465-r-fcn-object-detection-via-region-based-fully-convolutional-networks.pdf

# SATO

Huang, Jonathan, et al. "Speed/accuracy trade-offs for modern convolutional object detectors." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

http://openaccess.thecvf.com/content_cvpr_2017/papers/Huang_SpeedAccuracy_Trade-Offs_for_CVPR_2017_paper.pdf

# DCN v1

Dai, Jifeng, et al. "Deformable convolutional networks." Proceedings of the IEEE international conference on computer vision. 2017.

http://openaccess.thecvf.com/content_ICCV_2017/papers/Dai_Deformable_Convolutional_Networks_ICCV_2017_paper.pdf

# DCN v2

Zhu, Xizhou, et al. "Deformable convnets v2: More deformable, better results." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.

http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhu_Deformable_ConvNets_V2_More_Deformable_Better_Results_CVPR_2019_paper.pdf

# Cascade R-CNN

Cai, Zhaowei, and Nuno Vasconcelos. "Cascade r-cnn: Delving into high quality object detection." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.

http://openaccess.thecvf.com/content_cvpr_2018/papers/Cai_Cascade_R-CNN_Delving_CVPR_2018_paper.pdf

# FPN

Lin, Tsung-Yi, et al. "Feature pyramid networks for object detection." CVPR. Vol. 1. No. 2. 2017.

http://openaccess.thecvf.com/content_cvpr_2017/papers/Lin_Feature_Pyramid_Networks_CVPR_2017_paper.pdf

# STDN

Zhou, Peng, et al. "Scale-transferrable object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.

http://openaccess.thecvf.com/content_cvpr_2018/papers/Zhou_Scale-Transferrable_Object_Detection_CVPR_2018_paper.pdf

# YOLO v3
YOLOv3: An Incremental Improvement
https://pjreddie.com/media/files/papers/YOLOv3.pdf

# RON

Kong, Tao, et al. "Ron: Reverse connection with objectness prior networks for object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.

http://openaccess.thecvf.com/content_cvpr_2017/papers/Kong_RON_Reverse_Connection_CVPR_2017_paper.pdf

# RefineDet

Zhang, Shifeng, et al. "Single-shot refinement neural network for object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.

http://openaccess.thecvf.com/content_cvpr_2018/papers/Zhang_Single-Shot_Refinement_Neural_CVPR_2018_paper.pdf

# M2Det

Zhao, Qijie, et al. "M2det: A single-shot object detector based on multi-level feature pyramid network." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33. 2019.

https://arxiv.org/pdf/1811.04533.pdf

# SNIP

Singh, Bharat, and Larry S. Davis. "An analysis of scale invariance in object detection snip." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.

http://openaccess.thecvf.com/content_cvpr_2018/papers/Singh_An_Analysis_of_CVPR_2018_paper.pdf

# SNIPER

Singh, Bharat, Mahyar Najibi, and Larry S. Davis. "SNIPER: Efficient multi-scale training." Advances in Neural Information Processing Systems. 2018.

http://papers.nips.cc/paper/8143-sniper-efficient-multi-scale-training.pdf

# AutoFocus

Najibi, Mahyar, Bharat Singh, and Larry S. Davis. "Autofocus: Efficient multi-scale inference." Proceedings of the IEEE International Conference on Computer Vision. 2019.

http://openaccess.thecvf.com/content_ICCV_2019/papers/Najibi_AutoFocus_Efficient_Multi-Scale_Inference_ICCV_2019_paper.pdf

# DetNet

Li, Zeming, et al. "Detnet: Design backbone for object detection." Proceedings of the European Conference on Computer Vision (ECCV). 2018.

http://openaccess.thecvf.com/content_ECCV_2018/papers/Zeming_Li_DetNet_Design_Backbone_ECCV_2018_paper.pdf

# TridentNet

Li, Yanghao, et al. "Scale-aware trident networks for object detection." arXiv preprint arXiv:1901.01892 (2019).

https://arxiv.org/pdf/1901.01892.pdf

-----

# OHEM

Shrivastava, Abhinav, Abhinav Gupta, and Ross Girshick. "Training region-based object detectors with online hard example mining." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.

https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Shrivastava_Training_Region-Based_Object_CVPR_2016_paper.pdf

# RetinaNet（Focal Loss）

Lin, Tsung-Yi, et al. "Focal loss for dense object detection." IEEE transactions on pattern analysis and machine intelligence (2018).

https://vision.cornell.edu/se3/wp-content/uploads/2017/09/focal_loss.pdf
https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8417976

# GHM

Li, Buyu, Yu Liu, and Xiaogang Wang. "Gradient harmonized single-stage detector." Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 33. 2019.

https://arxiv.org/pdf/1811.05181.pdf

# Libra R-CNN

Pang, Jiangmiao, et al. "Libra r-cnn: Towards balanced learning for object detection." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.

http://openaccess.thecvf.com/content_CVPR_2019/papers/Pang_Libra_R-CNN_Towards_Balanced_Learning_for_Object_Detection_CVPR_2019_paper.pdf

# DCR v1

Cheng, Bowen, et al. "Revisiting rcnn: On awakening the classification power of faster rcnn." Proceedings of the European Conference on Computer Vision (ECCV). 2018.

http://openaccess.thecvf.com/content_ECCV_2018/papers/Bowen_Cheng_Revisiting_RCNN_On_ECCV_2018_paper.pdf

# DCR v2

Cheng, Bowen, et al. "Decoupled classification refinement: Hard false positive suppression for object detection." arXiv preprint arXiv:1810.04002 (2018).

https://arxiv.org/pdf/1810.04002.pdf

# PISA

Cao, Yuhang, et al. "Prime Sample Attention in Object Detection." arXiv preprint arXiv:1904.04821 (2019).

https://arxiv.org/pdf/1904.04821.pdf

-----

// CornerNet

Law, Hei, and Jia Deng. "Cornernet: Detecting objects as paired keypoints." Proceedings of the European Conference on Computer Vision (ECCV). 2018.

http://openaccess.thecvf.com/content_ECCV_2018/papers/Hei_Law_CornerNet_Detecting_Objects_ECCV_2018_paper.pdf

// CenterNet

Duan, Kaiwen, et al. "CenterNet: Object Detection with Keypoint Triplets." arXiv preprint arXiv:1904.08189 (2019).

https://arxiv.org/pdf/1904.08189.pdf

// SelectNet

Liu, Yunru, Tingran Gao, and Haizhao Yang. "SelectNet: Learning to Sample from the Wild for Imbalanced Data Training." arXiv preprint arXiv:1905.09872 (2019).

https://arxiv.org/pdf/1905.09872.pdf

// Bottom-up

Zhou, Xingyi, Jiacheng Zhuo, and Philipp Krahenbuhl. "Bottom-up object detection by grouping extreme and center points." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.

http://openaccess.thecvf.com/content_CVPR_2019/papers/Zhou_Bottom-Up_Object_Detection_by_Grouping_Extreme_and_Center_Points_CVPR_2019_paper.pdf

-----

Part II - Natural Language Processing

-----

◎ LSTM

-----

// RNN（Recurrent Neural Network）

Elman, Jeffrey L. "Finding structure in time." Cognitive science 14.2 (1990): 179-211.

http://www2.fiit.stuba.sk/~kvasnicka/NeuralNetworks/6.prednaska/Elman_SRNN_paper.pdf

# LSTM（Long Short-Term Memory）

Hochreiter, Sepp, and Jürgen Schmidhuber. "Long short-term memory." Neural computation 9.8 (1997): 1735-1780.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.676.4320&rep=rep1&type=pdf

// BRNN（Bidirectional RNN）

Schuster, Mike, and Kuldip K. Paliwal. "Bidirectional recurrent neural networks." IEEE Transactions on Signal Processing 45.11 (1997): 2673-2681.

http://www.cs.cmu.edu/afs/cs/user/bhiksha/WWW/courses/deeplearning/Fall.2016/pdfs/Schuster97_BRNN.pdf

// BLSTM（Bidirectional LSTM）

Graves, Alex, Navdeep Jaitly, and Abdel-rahman Mohamed. "Hybrid speech recognition with deep bidirectional LSTM." Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on. IEEE, 2013.

http://www.cs.toronto.edu/~graves/asru_2013.pdf

# GRU（Gated Recurrent Unit）

Cho, Kyunghyun, et al. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." arXiv preprint arXiv:1406.1078 (2014).

https://arxiv.org/pdf/1406.1078.pdf

// MGU（Minimal Gated Unit）

Zhou, Guo-Bing, et al. "Minimal gated unit for recurrent neural networks." International Journal of Automation and Computing 13.3 (2016): 226-234.

https://arxiv.org/pdf/1603.09420.pdf

// SRU（Simple Recurrent Unit）

Lei, Tao, et al. "Simple recurrent units for highly parallelizable recurrence." Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018.

https://arxiv.org/pdf/1709.02755.pdf

// Comparison of LSTM, GRU, MGU, and SRU

Hou, Bo-Jian, and Zhi-Hua Zhou. "Learning with Interpretable Structure from RNN." arXiv preprint arXiv:1810.10708 (2018).

https://arxiv.org/pdf/1810.10708.pdf

-----

// EM
Dempster, Arthur P., Nan M. Laird, and Donald B. Rubin. "Maximum likelihood from incomplete data via the EM algorithm." Journal of the Royal Statistical Society: Series B (Methodological) 39.1 (1977): 1-22.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.163.7580&rep=rep1&type=pdf

// STM
Levenberg, Abby, Chris Callison-Burch, and Miles Osborne. "Stream-based translation models for statistical machine translation." Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics. 2010.
https://www.aclweb.org/anthology/N10-1062.pdf

// n-gram
Brown, Peter F., et al. "Class-based n-gram models of natural language." Computational linguistics 18.4 (1992): 467-480.
https://www.aclweb.org/anthology/J92-4003.pdf

# NNLM

Bengio, Yoshua, et al. "A neural probabilistic language model." Journal of machine learning research 3.Feb (2003): 1137-1155.

http://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf

// C&W

Collobert, Ronan, and Jason Weston. "A unified architecture for natural language processing: Deep neural networks with multitask learning." Proceedings of the 25th international conference on Machine learning. ACM, 2008.

http://www.thespermwhale.com/jaseweston/papers/unified_nlp.pdf

# C&W
Collobert, Ronan, et al. "Natural language processing (almost) from scratch." Journal of machine learning research 12.ARTICLE (2011): 2493-2537.
http://www.jmlr.org/papers/volume12/collobert11a/collobert11a.pdf

// RNNLM v1
Mikolov, Tomáš, Martin Karafiát, and Lukáš Burget. "Jan ˇCernocky, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model." Eleventh annual conference of the international speech communication association. 2010.
https://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf

// RNNLM v2
Mikolov, Tomáš, et al. "Extensions of recurrent neural network language model." 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, 2011.
https://pdfs.semanticscholar.org/bba8/a2c9b9121e7c78e91ea2a68630e77c0ad20f.pdf

// RNNLM v3
Mikolov, Tomas, et al. "Rnnlm-recurrent neural network language modeling toolkit." Proc. of the 2011 ASRU Workshop. 2011.
http://www.fit.vutbr.cz/~imikolov/rnnlm/rnnlm-demo.pdf

-----

# Word2vec v1

Mikolov, Tomas, et al. "Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781 (2013).

https://arxiv.org/pdf/1301.3781.pdf

# Word2vec v2

Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems. 2013.

https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf

# Word2vec v3

Rong, Xin. "word2vec parameter learning explained." arXiv preprint arXiv:1411.2738 (2014).

https://arxiv.org/pdf/1411.2738.pdf

// GloVe

Pennington, Jeffrey, Richard Socher, and Christopher Manning. "Glove: Global vectors for word representation." Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014.

https://www.aclweb.org/anthology/D14-1162

// fastText v1

Joulin, Armand, et al. "Bag of tricks for efficient text classification." arXiv preprint arXiv:1607.01759 (2016).

https://arxiv.org/pdf/1607.01759.pdf

// fastText v2
Bojanowski, Piotr, et al. "Enriching word vectors with subword information." Transactions of the Association for Computational Linguistics 5 (2017): 135-146.
https://www.mitpressjournals.org/doi/pdfplus/10.1162/tacl_a_00051

// WordRank
Ji, Shihao, et al. "Wordrank: Learning word embeddings via robust ranking." arXiv preprint arXiv:1506.02761 (2015).
https://arxiv.org/pdf/1506.02761.pdf

-----

◎ Seq2seq

-----

# Seq2seq 1 - using LSTM

Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." Advances in neural information processing systems. 2014.

http://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf

# Seq2seq 2

-----

# Paragraph2vec

Le, Quoc, and Tomas Mikolov. "Distributed representations of sentences and documents." International conference on machine learning. 2014.

http://proceedings.mlr.press/v32/le14.pdf

// Skip-Thought

Kiros, Ryan, et al. "Skip-thought vectors." Advances in neural information processing systems. 2015.

https://papers.nips.cc/paper/5950-skip-thought-vectors.pdf

// Quick-Thought

Logeswaran, Lajanugen, and Honglak Lee. "An efficient framework for learning sentence representations." arXiv preprint arXiv:1803.02893 (2018).

https://arxiv.org/pdf/1803.02893.pdf

// InferSent

Conneau, Alexis, et al. "Supervised learning of universal sentence representations from natural language inference data." arXiv preprint arXiv:1705.02364 (2017).

https://arxiv.org/pdf/1705.02364.pdf

// MILA SE

// Google SE

-----

◎ Attention

-----

# Attention 1 - using GRU

Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. "Neural machine translation by jointly learning to align and translate." arXiv preprint arXiv:1409.0473 (2014).

https://arxiv.org/pdf/1409.0473.pdf

# Visual Attention

Xu, Kelvin, et al. "Show, attend and tell: Neural image caption generation with visual attention." International conference on machine learning. 2015.

http://proceedings.mlr.press/v37/xuc15.pdf

# Grad-CAM

# Attention 2 - using LSTM

Luong, Minh-Thang, Hieu Pham, and Christopher D. Manning. "Effective approaches to attention-based neural machine translation." arXiv preprint arXiv:1508.04025 (2015).

https://arxiv.org/pdf/1508.04025.pdf

-----

// NTM

Graves, Alex, Greg Wayne, and Ivo Danihelka. "Neural turing machines." arXiv preprint arXiv:1410.5401 (2014).

https://arxiv.org/pdf/1410.5401.pdf

// DNC
// Hybrid Computing

Graves, Alex, et al. "Hybrid computing using a neural network with dynamic external memory." Nature 538.7626 (2016): 471.

https://campus.swarma.org/public/ueditor/php/upload/file/20170609/1497019302822809.pdf

// MANN

// RL NTM

Zaremba, Wojciech, and Ilya Sutskever. "Reinforcement learning neural turing machines-revised." arXiv preprint arXiv:1505.00521 (2015).

https://arxiv.org/pdf/1505.00521.pdf

// Implementing NTM

Collier, Mark, and Joeran Beel. "Implementing Neural Turing Machines." International Conference on Artificial Neural Networks. Springer, Cham, 2018.

https://arxiv.org/pdf/1807.08518.pdf

-----

// MN
J Weston, S Chopra, and A Bordes. Memory networks. ICLR, 2014.
https://arxiv.org/abs/1410.3916

// EEMN

Sukhbaatar, Sainbayar, Jason Weston, and Rob Fergus. "End-to-end memory networks." Advances in neural information processing systems. 2015.

https://papers.nips.cc/paper/5846-end-to-end-memory-networks.pdf

// KVMN

Miller, Alexander, et al. "Key-value memory networks for directly reading documents." arXiv preprint arXiv:1606.03126 (2016).

https://arxiv.org/pdf/1606.03126.pdf

// PN

Vinyals, Oriol, Meire Fortunato, and Navdeep Jaitly. "Pointer networks." Advances in Neural Information Processing Systems. 2015.

http://papers.nips.cc/paper/5866-pointer-networks.pdf

// Set2set

Vinyals, Oriol, Samy Bengio, and Manjunath Kudlur. "Order matters: Sequence to sequence for sets." arXiv preprint arXiv:1511.06391 (2015).

https://arxiv.org/pdf/1511.06391.pdf

# FSA

Daniluk, Michał, et al. "Frustratingly short attention spans in neural language modeling." arXiv preprint arXiv:1702.04521 (2017).

https://arxiv.org/pdf/1702.04521.pdf

// MHA

Iida, Shohei, et al. "A Multi-Hop Attention for RNN based Neural Machine Translation." Proceedings of The 8th Workshop on Patent and Scientific Literature Translation. 2019.

https://www.aclweb.org/anthology/W19-7203

// AOH

Iida, Shohei, et al. "Attention over Heads: A Multi-Hop Attention for Neural Machine Translation." Proceedings of the 57th Conference of the Association for Computational Linguistics: Student Research Workshop. 2019.

https://www.aclweb.org/anthology/P19-2030

-----

◎ ConvS2S

-----

// GLU

Dauphin, Yann N., et al. "Language modeling with gated convolutional networks." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017.

https://arxiv.org/pdf/1612.08083.pdf

# ConvS2S

Gehring, Jonas, et al. "Convolutional sequence to sequence learning." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017.

https://arxiv.org/pdf/1705.03122.pdf

-----

# Contex2vec

# ELMo

Peters, Matthew E., et al. "Deep contextualized word representations." arXiv preprint arXiv:1802.05365 (2018).

https://arxiv.org/pdf/1802.05365.pdf

// ULMFiT

Howard, Jeremy, and Sebastian Ruder. "Universal language model fine-tuning for text classification." arXiv preprint arXiv:1801.06146 (2018).

https://arxiv.org/pdf/1801.06146.pdf

// MultiFiT

-----

◎ Transformer

-----

# Transformer

Vaswani, Ashish, et al. "Attention is all you need." Advances in Neural Information Processing Systems. 2017.

https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

# GPT-1

Radford, Alec, et al. "Improving language understanding by generative pre-training." URL https://s3-us-west-2. amazonaws. com/openai-assets/research-covers/languageunsupervised/language understanding paper. pdf (2018).

https://www.cs.ubc.ca/~amuham01/LING530/papers/radford2018improving.pdf

# GPT-2

Vig, Jesse. "Visualizing Attention in Transformer-Based Language models." arXiv preprint arXiv:1904.02679 (2019).

https://arxiv.org/pdf/1904.02679.pdf

# GPT-3

# BERT

Devlin, Jacob, et al. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).

https://arxiv.org/pdf/1810.04805.pdf

// MTL

Baxter, Jonathan. "A model of inductive bias learning." Journal of artificial intelligence research 12 (2000): 149-198.

https://arxiv.org/pdf/1106.0245.pdf

// MTL Overview

Ruder, Sebastian. "An overview of multi-task learning in deep neural networks." arXiv preprint arXiv:1706.05098 (2017).

https://arxiv.org/pdf/1706.05098.pdf

-----

// Universal Transformers

Dehghani, Mostafa, et al. "Universal transformers." arXiv preprint arXiv:1807.03819 (2018).

https://arxiv.org/pdf/1807.03819.pdf

// Transformer XL

Dai, Zihang, et al. "Transformer-xl: Attentive language models beyond a fixed-length context." arXiv preprint arXiv:1901.02860 (2019).

https://arxiv.org/pdf/1901.02860.pdf

// MT-DNN

Liu, Xiaodong, et al. "Multi-Task Deep Neural Networks for Natural Language Understanding." arXiv preprint arXiv:1901.11504 (2019).

https://arxiv.org/pdf/1901.11504.pdf

// ERNIE Baidu
Sun, Yu, et al. "ERNIE: Enhanced Representation through Knowledge Integration." arXiv preprint arXiv:1904.09223 (2019).
https://arxiv.org/pdf/1904.09223.pdf

// ERNIE THU

Zhang, Zhengyan, et al. "ERNIE: Enhanced Language Representation with Informative Entities." arXiv preprint arXiv:1905.07129 (2019).

https://arxiv.org/pdf/1905.07129.pdf

// XLMs Facebook

Lample, Guillaume, and Alexis Conneau. "Cross-lingual Language Model Pretraining." arXiv preprint arXiv:1901.07291 (2019).

https://arxiv.org/pdf/1901.07291.pdf

// LASER Facebook

Artetxe, Mikel, and Holger Schwenk. "Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond." arXiv preprint arXiv:1812.10464 (2018).

https://arxiv.org/pdf/1812.10464.pdf

// MASS Microsoft

Song, Kaitao, et al. "Mass: Masked sequence to sequence pre-training for language generation." arXiv preprint arXiv:1905.02450 (2019).

https://arxiv.org/pdf/1905.02450.pdf

// UNILM Microsoft

Dong, Li, et al. "Unified Language Model Pre-training for Natural Language Understanding and Generation." arXiv preprint arXiv:1905.03197 (2019).

https://arxiv.org/pdf/1905.03197.pdf

// ON-LSTM

Shen, Yikang, et al. "Ordered neurons: Integrating tree structures into recurrent neural networks." arXiv preprint arXiv:1810.09536 (2018).

https://arxiv.org/pdf/1810.09536.pdf

// XLNet

Yang, Zhilin, et al. "XLNet: Generalized Autoregressive Pretraining for Language Understanding." arXiv preprint arXiv:1906.08237 (2019).

https://arxiv.org/pdf/1906.08237.pdf

-----

Part III：Fundamental Topics

-----

◎ Regularization

-----

# Weight Decay

Zhang, Guodong, et al. "Three mechanisms of weight decay regularization." arXiv preprint arXiv:1810.12281 (2018).

https://arxiv.org/pdf/1810.12281.pdf

// WD 1989

Hanson, Stephen José, and Lorien Y. Pratt. "Comparing biases for minimal network construction with back-propagation." Advances in neural information processing systems. 1989.

http://papers.nips.cc/paper/156-comparing-biases-for-minimal-network-construction-with-back-propagation.pdf

// WD 1992

Krogh, Anders, and John A. Hertz. "A simple weight decay can improve generalization." Advances in neural information processing systems. 1992.

http://papers.nips.cc/paper/563-a-simple-weight-decay-can-improve-generalization.pdf

# L2
# Ridge Regression

Hoerl, Arthur E., and Robert W. Kennard. "Ridge regression: Biased estimation for nonorthogonal problems." Technometrics 12.1 (1970): 55-67.

https://amstat.tandfonline.com/doi/pdf/10.1080/00401706.1970.10488634

# L1
# Lasso Regression

Tibshirani, Robert. "Regression shrinkage and selection via the lasso." Journal of the Royal Statistical Society: Series B (Methodological) 58.1 (1996): 267-288.

http://www.stat.ucla.edu/~sczhu/courses/ucla/stat_232b/chapters/LASSO.pdf

# L0

Louizos, Christos, Max Welling, and Diederik P. Kingma. "Learning Sparse Neural Networks through $ L_0 $ Regularization." arXiv preprint arXiv:1712.01312 (2017).

https://arxiv.org/pdf/1712.01312.pdf

-----

# Dropout

Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." The Journal of Machine Learning Research 15.1 (2014): 1929-1958.

http://www.jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf

# DropConnect

Wan, Li, et al. "Regularization of neural networks using dropconnect." International conference on machine learning. 2013.

http://proceedings.mlr.press/v28/wan13.pdf

# FractalNet（DropPath）

Larsson, Gustav, Michael Maire, and Gregory Shakhnarovich. "Fractalnet: Ultra-deep neural networks without residuals." arXiv preprint arXiv:1605.07648 (2016).

https://arxiv.org/pdf/1605.07648.pdf

# NASNet（Scheduled DropPath）

Zoph, Barret, et al. "Learning transferable architectures for scalable image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.

http://openaccess.thecvf.com/content_cvpr_2018/papers/Zoph_Learning_Transferable_Architectures_CVPR_2018_paper.pdf

# Shake-Shake

Gastaldi, Xavier. "Shake-shake regularization." arXiv preprint arXiv:1705.07485 (2017).

https://arxiv.org/pdf/1705.07485.pdf

# ShakeDrop

Yamada, Yoshihiro, et al. "Shakedrop regularization for deep residual learning." arXiv preprint arXiv:1802.02375 (2018).

https://arxiv.org/pdf/1802.02375.pdf

# Spatial Dropout

Tompson, Jonathan, et al. "Efficient object localization using convolutional networks." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2015.

https://www.cv-foundation.org/openaccess/content_cvpr_2015/papers/Tompson_Efficient_Object_Localization_2015_CVPR_paper.pdf

# Cutout

DeVries, Terrance, and Graham W. Taylor. "Improved regularization of convolutional neural networks with cutout." arXiv preprint arXiv:1708.04552 (2017).

https://arxiv.org/pdf/1708.04552.pdf

# DropBlock

Ghiasi, Golnaz, Tsung-Yi Lin, and Quoc V. Le. "Dropblock: A regularization method for convolutional networks." Advances in Neural Information Processing Systems. 2018.

http://papers.nips.cc/paper/8271-dropblock-a-regularization-method-for-convolutional-networks.pdf

-----

# Fast Dropout

Bayer, Justin, et al. "On fast dropout and its applicability to recurrent networks." arXiv preprint arXiv:1311.0701 (2013).

https://arxiv.org/pdf/1311.0701.pdf

# RNN Regularization

Zaremba, Wojciech, Ilya Sutskever, and Oriol Vinyals. "Recurrent neural network regularization." arXiv preprint arXiv:1409.2329 (2014).

https://arxiv.org/pdf/1409.2329.pdf

# Variational Dropout

Kingma, Durk P., Tim Salimans, and Max Welling. "Variational dropout and the local reparameterization trick." Advances in Neural Information Processing Systems. 2015.

https://papers.nips.cc/paper/5666-variational-dropout-and-the-local-reparameterization-trick.pdf

# Information Dropout

Achille, Alessandro, and Stefano Soatto. "Information dropout: Learning optimal representations through noisy computation." IEEE transactions on pattern analysis and machine intelligence 40.12 (2018): 2897-2905.

http://www.vision.jhu.edu/teaching/learning/deeplearning18/assets/Achille_Soatto-18.pdf

# rnnDrop

Moon, Taesup, et al. "Rnndrop: A novel dropout for rnns in asr." 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU). IEEE, 2015.

http://mind.skku.edu/files/Conference/asru2015.pdf

# DropEmbbeding

Gal, Yarin, and Zoubin Ghahramani. "A theoretically grounded application of dropout in recurrent neural networks." Advances in neural information processing systems. 2016.

https://papers.nips.cc/paper/6241-a-theoretically-grounded-application-of-dropout-in-recurrent-neural-networks.pdf

# Recurrent Dropout

Semeniuta, Stanislau, Aliaksei Severyn, and Erhardt Barth. "Recurrent dropout without memory loss." arXiv preprint arXiv:1603.05118 (2016).

https://arxiv.org/pdf/1603.05118.pdf

# Zoneout

Krueger, David, et al. "Zoneout: Regularizing rnns by randomly preserving hidden activations." arXiv preprint arXiv:1606.01305 (2016).

https://arxiv.org/pdf/1606.01305.pdf

# AWD-LSTM

Merity, Stephen, Nitish Shirish Keskar, and Richard Socher. "Regularizing and optimizing LSTM language models." arXiv preprint arXiv:1708.02182 (2017).

https://arxiv.org/pdf/1708.02182.pdf

-----

# DropAttention

Zehui, Lin, et al. "DropAttention: A Regularization Method for Fully-Connected Self-Attention Networks." arXiv preprint arXiv:1907.11065 (2019).

https://arxiv.org/pdf/1907.11065.pdf

-----

# Pairing Samples

Inoue, Hiroshi. "Data augmentation by pairing samples for images classification." arXiv preprint arXiv:1801.02929 (2018).

https://arxiv.org/pdf/1801.02929.pdf

# Mixup

Zhang, Hongyi, et al. "mixup: Beyond empirical risk minimization." arXiv preprint arXiv:1710.09412 (2017).

https://arxiv.org/pdf/1710.09412.pdf

-----

◎ Normalization

-----

# BN

Ioffe, Sergey, and Christian Szegedy. "Batch normalization: Accelerating deep network training by reducing internal covariate shift." International conference on machine learning. 2015.

http://proceedings.mlr.press/v37/ioffe15.pdf

# WN

Salimans, Tim, and Durk P. Kingma. "Weight normalization: A simple reparameterization to accelerate training of deep neural networks." Advances in Neural Information Processing Systems. 2016.

https://papers.nips.cc/paper/6114-weight-normalization-a-simple-reparameterization-to-accelerate-training-of-deep-neural-networks.pdf

# LN

Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. "Layer normalization." arXiv preprint arXiv:1607.06450 (2016).

https://arxiv.org/pdf/1607.06450.pdf

# IN

Ulyanov, Dmitry, Andrea Vedaldi, and Victor Lempitsky. "Instance normalization: The missing ingredient for fast stylization." arXiv preprint arXiv:1607.08022 (2016).

https://arxiv.org/pdf/1607.08022.pdf

# AIN

Huang, Xun, and Serge Belongie. "Arbitrary style transfer in real-time with adaptive instance normalization." Proceedings of the IEEE International Conference on Computer Vision. 2017.

http://openaccess.thecvf.com/content_ICCV_2017/papers/Huang_Arbitrary_Style_Transfer_ICCV_2017_paper.pdf

# GN

Wu, Yuxin, and Kaiming He. "Group normalization." Proceedings of the European Conference on Computer Vision (ECCV). 2018.

http://openaccess.thecvf.com/content_ECCV_2018/papers/Yuxin_Wu_Group_Normalization_ECCV_2018_paper.pdf

# PN

Li, Boyi, et al. "Positional Normalization." Advances in Neural Information Processing Systems. 2019.

http://papers.nips.cc/paper/8440-positional-normalization.pdf

# UBN

Bjorck, Nils, et al. "Understanding batch normalization." Advances in Neural Information Processing Systems. 2018.

http://papers.nips.cc/paper/7996-understanding-batch-normalization.pdf

# TUBN

Kohler, Jonas, et al. "Towards a theoretical understanding of batch normalization." stat 1050 (2018): 27.

https://arxiv.org/pdf/1805.10694.pdf

# BNHO

Santurkar, Shibani, et al. "How does batch normalization help optimization?." Advances in Neural Information Processing Systems. 2018.

https://papers.nips.cc/paper/7515-how-does-batch-normalization-help-optimization.pdf

# URBN

Luo, Ping, et al. "Understanding regularization in batch normalization." arXiv preprint arXiv:1809.00846 (2018).

https://arxiv.org/pdf/1809.00846.pdf

# NormProp

Arpit, Devansh, et al. "Normalization propagation: A parametric technique for removing internal covariate shift in deep networks." arXiv preprint arXiv:1603.01431 (2016).

https://arxiv.org/pdf/1603.01431.pdf

# Efficient Backprop

LeCun, Yann A., et al. "Efficient backprop." Neural networks: Tricks of the trade. Springer, Berlin, Heidelberg, 2012. 9-48.

http://cseweb.ucsd.edu/classes/wi08/cse253/Handouts/lecun-98b.pdf

# Whitening

Kessy, Agnan, Alex Lewin, and Korbinian Strimmer. "Optimal whitening and decorrelation." The American Statistician 72.4 (2018): 309-314.

https://arxiv.org/pdf/1512.00809.pdf

# CAT

Zuber, Verena, and Korbinian Strimmer. "Gene ranking and biomarker discovery under correlation." Bioinformatics 25.20 (2009): 2700-2707.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.247.8982&rep=rep1&type=pdf

# CAR

Zuber, Verena, and Korbinian Strimmer. "High-dimensional regression and variable selection using CAR scores." Statistical Applications in Genetics and Molecular Biology 10.1 (2011).

https://arxiv.org/pdf/1007.5516.pdf

# GWNN

Luo, Ping. "Learning deep architectures via generalized whitened neural networks." Proceedings of the 34th International Conference on Machine Learning-Volume 70. JMLR. org, 2017.

http://proceedings.mlr.press/v70/luo17a/luo17a.pdf

# DBN

Huang, Lei, et al. "Decorrelated batch normalization." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.

http://openaccess.thecvf.com/content_cvpr_2018/papers/Huang_Decorrelated_Batch_Normalization_CVPR_2018_paper.pdf

# KN

Wang, Guangrun, et al. "Kalman normalization: Normalizing internal representations across network layers." Advances in Neural Information Processing Systems. 2018.

https://papers.nips.cc/paper/7288-kalman-normalization-normalizing-internal-representations-across-network-layers.pdf

# IterNorm

Huang, Lei, et al. "Iterative Normalization: Beyond Standardization towards Efficient Whitening." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.

http://openaccess.thecvf.com/content_CVPR_2019/papers/Huang_Iterative_Normalization_Beyond_Standardization_Towards_Efficient_Whitening_CVPR_2019_paper.pdf

-----

◎ Optimization

-----

# SGD

Bottou, Léon. "Stochastic gradient descent tricks." Neural networks: Tricks of the trade. Springer, Berlin, Heidelberg, 2012. 421-436.

https://www.microsoft.com/en-us/research/wp-content/uploads/2012/01/tricks-2012.pdf

# Momentum

Sutskever, Ilya, et al. "On the importance of initialization and momentum in deep learning." International conference on machine learning. 2013.

http://proceedings.mlr.press/v28/sutskever13.pdf

# NAG

Nesterov, Y. "A method of solving a convex programming problem with convergence rate $$ O (\frac {1}{k^ 2}) $$ O (1k2)." Soviet Math. Dokl. Vol. 27.

http://mpawankumar.info/teaching/cdt-big-data/nesterov83.pdf

# AdaGrad

Duchi, John, Elad Hazan, and Yoram Singer. "Adaptive subgradient methods for online learning and stochastic optimization." Journal of Machine Learning Research 12.Jul (2011): 2121-2159.

http://www.jmlr.org/papers/volume12/duchi11a/duchi11a.pdf

# AdaDelta

Zeiler, Matthew D. "ADADELTA: an adaptive learning rate method." arXiv preprint arXiv:1212.5701 (2012).

https://arxiv.org/pdf/1212.5701.pdf

# RMSProp

Tieleman, Tijmen, and Geoffrey Hinton. "Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude." COURSERA: Neural networks for machine learning 4.2 (2012): 26-31.

http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf

# Adam

Kingma, Diederik P., and Jimmy Ba. "Adam: A method for stochastic optimization." arXiv preprint arXiv:1412.6980 (2014).

https://arxiv.org/pdf/1412.6980.pdf

# From Adam to SGD
Keskar, Nitish Shirish, and Richard Socher. "Improving generalization performance by switching from adam to sgd." arXiv preprint arXiv:1712.07628 (2017).

https://arxiv.org/pdf/1712.07628.pdf

# Nadam

Dozat, Timothy. "Incorporating nesterov momentum into adam." (2016).

https://openreview.net/pdf?id=OM0jvwB8jIp57ZJjtNEZ

# AMSGrad

Reddi, Sashank J., Satyen Kale, and Sanjiv Kumar. "On the convergence of adam and beyond." International Conference on Learning Representations. 2018.

http://www.satyenkale.com/papers/amsgrad.pdf

# RAdam

Liu, Liyuan, et al. "On the variance of the adaptive learning rate and beyond." arXiv preprint arXiv:1908.03265 (2019).

https://arxiv.org/pdf/1908.03265.pdf

# SMA

Nau, Robert. "Forecasting with moving averages." Fuqua School of Business, Duke University (2014): 1-3.

https://people.duke.edu/~rnau/Notes_on_forecasting_with_moving_averages--Robert_Nau.pdf

# Lookahead

Zhang, Michael, et al. "Lookahead Optimizer: k steps forward, 1 step back." Advances in Neural Information Processing Systems. 2019.

http://papers.nips.cc/paper/9155-lookahead-optimizer-k-steps-forward-1-step-back.pdf

# EMA

Hunter, J. Stuart. "The exponentially weighted moving average." Journal of quality technology 18.4 (1986): 203-210.

https://www.researchgate.net/profile/Arumugam_Raman/post/What_kind_of_data_is_usually_considered_in_Construction_of_Shewhart_control_charts/attachment/59d6255579197b8077983a73/AS%3A273836358995969%401442299083585/download/L11-OnEWMA.pdf

# LAMB

You, Yang, et al. "Reducing BERT Pre-Training Time from 3 Days to 76 Minutes." arXiv preprint arXiv:1904.00962 (2019).

https://arxiv.org/pdf/1904.00962.pdf

# CLR

Smith, Leslie N. "Cyclical learning rates for training neural networks." 2017 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2017.

https://arxiv.org/pdf/1506.01186.pdf

# SGDR

Loshchilov, Ilya, and Frank Hutter. "Sgdr: Stochastic gradient descent with warm restarts." arXiv preprint arXiv:1608.03983 (2016).

https://arxiv.org/pdf/1608.03983.pdf

# AdamW

Loshchilov, Ilya, and Frank Hutter. "Decoupled weight decay regularization (2019)." arXiv preprint arXiv:1711.05101.

https://arxiv.org/pdf/1711.05101.pdf

# Super-Convergence

Smith, Leslie N., and Nicholay Topin. "Super-convergence: Very fast training of residual networks using large learning rates." (2018).

https://openreview.net/pdf?id=H1A5ztj3b

# ADMM

Boyd, Stephen, et al. "Distributed optimization and statistical learning via the alternating direction method of multipliers." Foundations and Trends® in Machine learning 3.1 (2011): 1-122.

https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.360.1664&rep=rep1&type=pdf

# ADMM-S

Taylor, Gavin, et al. "Training neural networks without gradients: A scalable admm approach." International conference on machine learning. 2016.

http://proceedings.mlr.press/v48/taylor16.pdf

# dlADMM

Wang, Junxiang, et al. "ADMM for Efficient Deep Learning with Global Convergence." arXiv preprint arXiv:1905.13611 (2019).

https://arxiv.org/pdf/1905.13611.pdf

-----

◎ Activation Function

-----

# Activation Function

Nwankpa, Chigozie, et al. "Activation functions: Comparison of trends in practice and research for deep learning." arXiv preprint arXiv:1811.03378 (2018).

https://arxiv.org/pdf/1811.03378.pdf

# ReLU 2000

# ReLU 2009

# Softplus

# LReLU

# PReLU

# ELU

# SELU

# GELU

# Swish

Ramachandran, Prajit, Barret Zoph, and Quoc V. Le. "Searching for activation functions." arXiv preprint arXiv:1710.05941 (2017).

https://arxiv.org/pdf/1710.05941.pdf

# Maxout

Goodfellow, Ian J., et al. "Maxout networks." arXiv preprint arXiv:1302.4389 (2013).

http://proceedings.mlr.press/v28/goodfellow13.pdf

-----

◎ Loss Function

-----

# Loss Function

Barron, Jonathan T. "A general and adaptive robust loss function." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.

http://openaccess.thecvf.com/content_CVPR_2019/papers/Barron_A_General_and_Adaptive_Robust_Loss_Function_CVPR_2019_paper.pdf

-----

◎ Pooling

-----

-----

◎ Convolution

-----

-----

◎ Automatic Differentiation

-----

-----

◎ Back Propagation

-----

-----

◎ Back Propagation

-----

# Back Propagation
Alber, Maximilian, et al. "Backprop evolution." arXiv preprint arXiv:1808.02822 (2018).
https://arxiv.org/pdf/1808.02822.pdf

-----

-----

◎ Computational Graph

-----

References

# 參考書籍
[1] 書名：深度學習: Caffe 之經典模型詳解與實戰，ISBN：7121301180，作者：樂毅，出版社：電子工業，出版日期：2016-09-30.

# LSTM 論文的輔助教材
[2] Understanding LSTM Networks -- colah's blog
http://colah.github.io/posts/2015-08-Understanding-LSTMs/

# 三篇快速版（2017春）
[3] Deep Learning Paper
http://hemingwang.blogspot.com/2019/01/deep-learning-paper.html

# 二十篇慢速版（2018春）
[4] PyTorch（六）：Seminar

# 三十篇基礎版（2019春）
[5] 30 Topics for Deep Learning
http://hemingwang.blogspot.com/2019/04/30-topics-for-deep-learning.html

# 十篇精華版（2019夏）
[6] AI 三部曲（深度學習：從入門到精通）
https://hemingwang.blogspot.com/2019/05/trilogy.html

# 五十篇完整版（2019秋）
[7] AI從頭學（三九）：Complete Works
http://hemingwang.blogspot.tw/2017/08/aicomplete-works.html

The Star Also Rises

Sunday, August 30, 2020

Deep Learning Highlight

No comments:

Programmer

Blog Archive

Labels

Recent Comments

My Blog List

MY LINKS

status

About Me