Monday, October 19, 2020

Regularization

Regularization

2020/10/19

-----


https://pixabay.com/zh/photos/health-fitness-health-is-wealth-4861815/

----- 

Regularization
https://hemingwang.blogspot.com/2019/10/an-overview-of-regularization.html 
https://hemingwang.blogspot.com/2019/10/regularization.html

-----


Regularization 一般被翻譯成正則化。正則化比較難以理解,所以讀者只要想成是避免過擬合即可。

那什麼叫過擬合呢?簡單說,機器學習,或者說統計學,是要從有限的樣本裡面,歸納出一個模型,用來預測真實世界裡面的無窮資料。如果我們過份考慮資料集的資料,則訓練出來的模型,就很可能無法預測資料集以外的資料。正則化,就是避免過擬合的一些方法的總稱。

-----


◎ Weight Decay、L2、L1、L0、

◎ Early Stopping、

◎ Fully Connected Networks(Dropout、DropConnect、)

CNN(DropPath、Scheduled DropPath、
Shake-Shake、ShakeDrop、
Spatial Dropout、Cutout、DropBlock、)

RNN(Fast Dropout、RNN Regularization、
Variational Dropout、Information Dropout、
rnnDrop、DropEmbedding、Recurrent Dropout、Zoneout、AWD-LSTM、)

Attention(DropAttention、)

◎ Label Smoothing、

◎ Pairing Samples、Mixup。 

----- 

Weight Decay
https://hemingwang.blogspot.com/2019/12/weight-decay.html

L2
https://hemingwang.blogspot.com/2019/12/l2.html

L1
https://hemingwang.blogspot.com/2019/12/l1.html

L0
https://hemingwang.blogspot.com/2019/12/l0.html

-----

Early Stopping 
https://hemingwang.blogspot.com/2019/12/early-stopping.html 

----- 

// FNN 

Dropout
https://hemingwang.blogspot.com/2020/09/dropout.html
https://hemingwang.blogspot.com/2019/12/dropout.html 

Dropconnect
https://hemingwang.blogspot.com/2019/12/dropconnect.html

----- 

// CNN

DropPath(FractalNet)
https://hemingwang.blogspot.com/2019/12/droppath.html

ResNet-D
https://hemingwang.blogspot.com/2019/11/resnet-d.html 

NASNet(Scheduled DropPath)
https://hemingwang.blogspot.com/2019/12/scheduled-droppath.html

Shake-Shake
https://hemingwang.blogspot.com/2019/12/shake-shake.html

ShakeDrop
https://hemingwang.blogspot.com/2019/12/shakedrop.html

Spatial Dropout
https://hemingwang.blogspot.com/2019/12/spatial-dropout.html 

Cutout
https://hemingwang.blogspot.com/2019/12/cutout.html

DropBlock
https://hemingwang.blogspot.com/2019/12/dropblock.html

-----

// RNN

Fast Dropout 
https://hemingwang.blogspot.com/2019/12/fast-dropout.html 

RNN Regularization 
https://hemingwang.blogspot.com/2019/12/rnn-regularization.html 

Variational Dropout
https://hemingwang.blogspot.com/2019/12/variational-dropout.html

Information Dropout
https://hemingwang.blogspot.com/2019/12/information-dropout.html

rnnDrop
https://hemingwang.blogspot.com/2019/12/rnndrop.html

DropEmbbeding
https://hemingwang.blogspot.com/2019/12/dropembbeding.html 

Recurrent Dropout
https://hemingwang.blogspot.com/2019/12/recurrent-dropout.html

Zoneout
https://hemingwang.blogspot.com/2019/12/zoneout.html

AWD-LSTM
https://hemingwang.blogspot.com/2019/12/awd-lstm.html 

-----

// Self Attention

DropAttention 
https://hemingwang.blogspot.com/2019/12/dropattention.html

-----

// Data Augmentation

Pairing Samples
http://hemingwang.blogspot.com/2019/12/pairing-samples.html

Mixup
http://hemingwang.blogspot.com/2019/12/mixup.html

----- 

AI 從頭學(2021 年版)

AI 從頭學(2021 年版)

2020/01/01

全方位 AI 課程(精華篇)
http://hemingwang.blogspot.com/2020/01/all-round-ai-lectures-highlight.html

-----


Fig. 2021(圖片來源:Pixabay)。

-----

一、LeNet - Bill

LeNet

LeNet Lab

AlexNet、(PreAlexNet)

ZFNet、(PreZFNet、Deconv)

-----

二、NIN - Sky

NIN

NIN Lab

SENet、(SKNet、STNet、RANet、BAM、CBAM、RASNet)

GoogLeNet、(VGGNet、PreVGGNet、Highway v1 v2、Inception v3 v4)

(CapsNet v0 v1 v2 v3)

-----

三、ResNet - Leo

ResNet v1、(ResNet-D、ResNet v2、ResNet-E、ResNet-V)

ResNet Lab

WRN、(PyramidNet、ResNeXt)

DenseNet、(DPN、DLA、Res2Net)

-----

四、FCN - 聖耀

FCN、(U-Net、V-Net、3D U-Net、Attention U-Net、Skip Connections、U-Net++、MultiResUNet、DC-UNet)

FCN Lab、(3D U-Net Lab)

Faster R-CNN

Mask R-CNN、(MS R-CNN、

-----

五、YOLO - Sam

YOLO

YOLO Lab

SSD、(DSSD、YOLO v2、FPN、RetinaNet、YOLOv3)

YOLO v4

-----

六、LSTM - Kiki

LSTM

LSTM Lab

NNLM、(C&W、RNNLM)

Word2vec、LSA、GloVe、FastText v1 v2、WordRank)


-----

七、Seq2seq - 銘文

Seq2seq

Seq2seq Lab

Paragraph2vec

C&W

-----

八、Attention - Chris

Attention

Attention Lab

NTM

FSA

-----

九、ConvS2S - Ian

ConvS2S

ConvS2S Lab

Context2vec

ELMo、(ULMFiT、MultiFiT)

-----

十、Transformer - 英宗

Transformer

Transformer Lab

GPT-1、(GPT-2、GPT-3)

BERT

-----

一一、Regularization - 佳晏


一二、Normalization


一三、Optimization - 田青


一四、Activation Function - 柏翰


一五、Loss Function - 建傑

-----

全方位 AI 課程(六十小時搞定深度學習)
https://hemingwang.blogspot.com/2020/01/all-round-ai-lectures.html

全方位 AI 課程(介紹篇)
https://hemingwang.blogspot.com/2020/01/all-round-ai-lectures-introduction.html

AI Seminar 2020 Taipei
https://hemingwang.blogspot.com/2019/12/ai-seminar-2020-taipei.html

-----

Python
https://hemingwang.blogspot.com/2019/02/python.html

-----

Part I:Computer Vision

◎ Image Classification

Stage 01:LeNet、(AlexNet、ZFNet)

Stage 02:NIN、(SENet、GoogLeNet、VGGNet、PreVGGNet、Highway v1 v2)、(Inception v3 v4、PolyNet)

Stage 03:ResNet v1 v2、(ResNet-D、ResNet-E、ResNet-I、ResNet-Q、ResNet-S、ResNet-W、WRN、ResNeXt、DenseNet、DPN、DLA、Res2Net)

◎ Semantic Segmentation

Stage 04:FCN、(DeconvNet、SegNet、U-Net、U-Net++、DilatedNet、ENet、DRN、FC-CRF、DeepLab v1 v2 v3 v3+、ResNet-38、RefineNet、RefineNet-LW、RefineNet-AA、PSPNet、ICNet、BiSeNet、Fast-SCNN、BlitzNet)

◎ Object Detection

Stage 05:(DPM、SS、R-CNN、SPPNet、Fast R-CNN、OHEM、Faster R-CNN、OverFeat)、YOLOv1、(SSD、DSSD、YOLOv2、ION、R-FCN、SATO、DCNv1、DCNv2、Cascade R-CNN、FPN、STDN、YOLOv3、RON、RefineDet、M2Det、DetNet、TridentNet、OHEM、Focal Loss、GHM、Libra R-CNN、DCRv1、DCRv2、PISA)

// ◎ Dataset

-----

Part II:Natural Language Processing

◎ LSTM
Stage 06:LSTM、(NNLM、Word2vec)

◎ Seq2seq
Stage 07:Seq2seq、(GloVe、fastText)

◎ Attention
Stage 08:Attention、(NTM、KVMN)

◎ ConvS2S
Stage 09:ConvS2S、(ELMo、ULMFiT)

◎ Transformer
Stage 10:Transformer、(GPT-1、BERT、GPT-2)

----- 

Part III:Fundamental Topics

◎ Regularization
Stage 11:(Weight Decay、L2、L1、L0、Dropout、DropConnect、DropPath、Scheduled DropPath、Shake-Shake、ShakeDrop、Spatial Dropout、Cutout、DropBlock、Fast Dropout、RNN Regularization、Variational Dropout、Information Dropout、DropEmbedding、Recurrent Dropout、Zoneout、AWD-LSTM、DropAttention、Mixup、Pairing Samples、AutoAugment)

◎ Normalization
Stage 12:(Batch、Weight、Layer、Instance、Group、Positional)

◎ Optimization
Stage 13:(SGD、Momentum、NAG、AdaGrad、AdaDelta、RMSProp、Adam、AdaMax、Nadam、AMSGrad、Lookahead、RAdam、LAMB、CLR、SGDR、AdamW、Super-Convergence、ADMM、ADMM-S、dlADMM)

◎ Activation Function
Stage 14:(sigmoid、tanh、ReLU、Softplus、LReLU、PReLU、ELU、SELU、GELU、Swish)

◎ Loss Function
Stage 15:

// ◎ Pooling
// ◎ Convolution
// ◎ Automatic Differentiation
// ◎ Back Propagation
// ◎ Computational Graph

-----

Part IV:Advanced Topics

◎ Instance Segmentation
Stage16:(Hypercolumn、MNC、DeepMask、SharpMask、MultiPathNet、InstanceFCN、FCIS)、Mask R-CNN、(MaskX R-CNN、MaskLab、PANet、HTC、RetinaMask、MS R-CNN、YOLACT)

◎ Mobile
Stage17:SqueezeNet、(MobileNet v1 v2 v3、ShuffleNet v1 v2、Xception)

◎ NAS
Stage18:NAS-RL、NASNet(Scheduled DropPath)、EfficientNet、Auto-DeepLab、NAS-FPN、 AutoAugment。

◎ GAN
Stage19:

◎ BERT
Stage20:

-----

Intelligence Science

-----

Intelligence Science
http://hemingwang.blogspot.com/2019/09/intelligence-science.html

-----


Part I:Computer Vision

-----

Computer Vision
https://hemingwang.blogspot.com/2019/10/computer-vision.html

https://hemingwang.blogspot.com/2019/10/gaussiansmooth.html

https://hemingwang.blogspot.com/2019/10/sobeledgedetection.html

https://hemingwang.blogspot.com/2019/10/structuretensor.html

https://hemingwang.blogspot.com/2019/10/nms.html

-----

◎ Image Classification

-----

Image Classification
https://hemingwang.blogspot.com/2019/10/image-classification.html

-----

◎ 1. LeNet(Image Classification)

-----

LeNet
https://hemingwang.blogspot.com/2019/05/trilogy.html
http://hemingwang.blogspot.com/2018/02/deep-learninglenet-bp.html
http://hemingwang.blogspot.com/2017/03/ailenet.html
http://hemingwang.blogspot.com/2017/03/ailenet-f6.html

AlexNet
http://hemingwang.blogspot.com/2017/05/aialexnet.html

PreAlexNet

ZFNet
http://hemingwang.blogspot.com/2017/05/aikernel-visualizing.html

PreZFNet

Deconv

-----

◎ 2. NINImage Classification)

-----

NIN
http://hemingwang.blogspot.com/2017/06/ainetwork-in-network.html

SENet
https://hemingwang.blogspot.com/2019/10/senet.html

SKNet
https://hemingwang.blogspot.com/2020/08/sknet.html

STNet
http://hemingwang.blogspot.com/2020/04/stnet.html

RANet

BAM

CBAM

RASNet

GoogLeNet
http://hemingwang.blogspot.com/2017/06/aigooglenet.html
http://hemingwang.blogspot.com/2017/06/aiconv1.html
http://hemingwang.blogspot.com/2017/08/aiinception.html

VGGNet
http://hemingwang.blogspot.com/2018/09/aivggnet.html 

PreVGGNet
https://hemingwang.blogspot.com/2019/11/prevggnet.html

Highway v1
http://hemingwang.blogspot.com/2019/11/highway.html

Highway v2

Inception v3
https://hemingwang.blogspot.com/2019/11/inception-v3.html

Inception v4
https://hemingwang.blogspot.com/2019/11/inception-v4.html

CapsNet v0

CapsNet v1
https://hemingwang.blogspot.com/2019/12/capsnet.html

CapsNet v2

CapsNet v3

-----

◎ 3. ResNetImage Classification)

-----

ResNet
https://hemingwang.blogspot.com/2019/05/vanishing-gradient.html
https://hemingwang.blogspot.com/2019/05/exploding-gradient.html
http://hemingwang.blogspot.com/2019/10/an-overview-of-resnet-and-its-variants.html
https://hemingwang.blogspot.com/2019/10/universal-approximation-theorem.html 
https://hemingwang.blogspot.com/2019/10/understanding-boxplots.html
https://hemingwang.blogspot.com/2019/10/ensemble-learning.html
http://hemingwang.blogspot.com/2018/09/airesnet.html



ResNet v1

ResNet-D
https://hemingwang.blogspot.com/2019/11/resnet-d.html

ResNet v2

ResNet-E
https://hemingwang.blogspot.com/2019/11/resnet-e.html

ResNet-V
https://hemingwang.blogspot.com/2019/12/resnet-v.html

-----

ResNet-F
https://hemingwang.blogspot.com/2019/12/resnet-f.html

ResNet-I
https://hemingwang.blogspot.com/2019/12/resnet-i.html

ResNet-Q
https://hemingwang.blogspot.com/2019/12/resnet-q.html

ResNet-S
https://hemingwang.blogspot.com/2019/11/resnet-s.html

ResNet-U
https://hemingwang.blogspot.com/2019/12/resnet-u.html 

ResNet-W
https://hemingwang.blogspot.com/2019/12/resnet-w.html

WRN
https://hemingwang.blogspot.com/2019/11/wrn.html

ResNeXt
https://hemingwang.blogspot.com/2019/10/resnext.html

DenseNet
https://hemingwang.blogspot.com/2019/11/densenet.html

DPN
https://hemingwang.blogspot.com/2019/11/dpn.html

DLA
https://hemingwang.blogspot.com/2019/11/dla.html

Res2Net
https://hemingwang.blogspot.com/2019/11/res2net.html

-----

PolyNet
https://hemingwang.blogspot.com/2019/11/polynet.html

FractalNet
https://hemingwang.blogspot.com/2019/12/fractalnet.html

RevNet

----- 

◎ 3.1. SqueezeNetImage Classification - Mobile)

-----

Mobile
https://hemingwang.blogspot.com/2019/10/mobile.html

SqueezeNet
https://hemingwang.blogspot.com/2019/10/squeezenet.html

MobileNet v1
https://hemingwang.blogspot.com/2019/10/mobilenet-v1.html

ShuffleNet

Xception
https://hemingwang.blogspot.com/2019/10/xception.html

-----

NAS-RL
https://hemingwang.blogspot.com/2019/12/nas-rl.html 

NASNet(Scheduled DropPath)
https://hemingwang.blogspot.com/2019/12/scheduled-droppath.html

EfficientNet 
https://hemingwang.blogspot.com/2019/12/efficientnet.html

Auto-DeepLab
https://hemingwang.blogspot.com/2019/12/auto-deeplab.html

NAS-FPN
https://hemingwang.blogspot.com/2019/12/nas-fpn.html 

AutoAugment
http://hemingwang.blogspot.com/2019/12/autoaugment.html

-----

◎ 4.1. FCN(Semantic  Segmentation)

-----

https://hemingwang.blogspot.com/2020/08/semantic-segmentation.html

-----

◎ 4.2. Instance Segmentation

-----

https://hemingwang.blogspot.com/2020/08/instance-segmentation.html

-----

◎ 4.3. Panoptic Segmentation

-----

https://hemingwang.blogspot.com/2020/08/panoptic-segmentation.html

-----

◎ 5. YOLOv1(Object Detection)

-----
 
Object Detection
https://hemingwang.blogspot.com/2019/10/object-detection.html 

-----

一、求好

DPM、SS、R-CNN、
SPPNet、Fast R-CNN、
Faster R-CNN、

-----

二、求快

OverFeat、YOLOv1、SSD、DSSD、YOLOv2、

-----

三、求好求快

ION、R-FCN、SATO、DCNv1、DCNv2、Cascade R-CNN、
FPN、STDN、YOLOv3、RON、RefineDet、M2Det、
SNIP、SNIPER、AutoFocus、
DetNet、TridentNet、

四、HARD

OHEM、Focal Loss、GHM、
Libra R-CNN、DCRv1、DCRv2、
PISA。

-----

DPM
https://hemingwang.blogspot.com/2019/11/dpm.html

SS
https://hemingwang.blogspot.com/2019/11/ss.html

R-CNN
https://hemingwang.blogspot.com/2019/11/r-cnn.html

SPPNet
https://hemingwang.blogspot.com/2019/11/sppnet.html

Fast R-CNN
https://hemingwang.blogspot.com/2019/11/fast-r-cnn.html

Faster R-CNN
https://hemingwang.blogspot.com/2019/09/faster-r-cnn.html 

-----

OverFeat
https://hemingwang.blogspot.com/2019/11/overfeat.html

YOLOv1
http://hemingwang.blogspot.com/2018/04/deep-learningyolo-v1.html
http://hemingwang.blogspot.com/2018/04/machine-learning-conceptmean-average.html
http://hemingwang.blogspot.com/2018/04/machine-learning-conceptnon-maximum.html
https://hemingwang.blogspot.com/2019/11/yolo-v1.html

SSD
https://hemingwang.blogspot.com/2019/09/ssd.html 

DSSD
https://hemingwang.blogspot.com/2019/11/dssd.html 

YOLOv2
https://hemingwang.blogspot.com/2019/11/yolo-v2.html

-----

ION
https://hemingwang.blogspot.com/2019/11/ion.html

R-FCN
https://hemingwang.blogspot.com/2019/11/r-fcn.html

SATO
https://hemingwang.blogspot.com/2019/10/sato.html 

DCNv1
https://hemingwang.blogspot.com/2019/12/dcn-v1.html

DCNv2
https://hemingwang.blogspot.com/2019/12/dcn-v2.html

Cascade R-CNN
https://hemingwang.blogspot.com/2019/12/cascade-r-cnn.html

FPN
https://hemingwang.blogspot.com/2019/11/fpn.html

STDN
https://hemingwang.blogspot.com/2019/12/stdn.html

YOLOv3
https://hemingwang.blogspot.com/2019/11/yolo-v3.html

RON
https://hemingwang.blogspot.com/2019/12/ron.html

RefineDet
https://hemingwang.blogspot.com/2019/11/refinedet.html

M2Det
https://hemingwang.blogspot.com/2019/10/m2det.html

SNIP
https://hemingwang.blogspot.com/2019/12/snip.html 

SNIPER
https://hemingwang.blogspot.com/2019/12/sniper.html

AutoFocus
https://hemingwang.blogspot.com/2019/12/autofocus.html

DetNet
https://hemingwang.blogspot.com/2019/12/detnet.html

TridentNet
https://hemingwang.blogspot.com/2019/12/tridentnet.html

-----

OHEM
https://hemingwang.blogspot.com/2019/11/ohem.html

Focal Loss
https://hemingwang.blogspot.com/2019/10/retinanet.html

GHM
https://hemingwang.blogspot.com/2019/12/ghm.html

Libra R-CNN
https://hemingwang.blogspot.com/2019/12/libra-r-cnn.html

DCRv1
https://hemingwang.blogspot.com/2019/12/dcr-v1.html

DCRv2
https://hemingwang.blogspot.com/2019/12/dcr-v2.html

PISA
https://hemingwang.blogspot.com/2019/12/pisa.html

-----

◎ 5.1. Dataset

-----

Dataset
https://hemingwang.blogspot.com/2019/10/dataset.html

CALTECH
CIFAR-10
PASCAL VO
COCO
MNIST
ILSVRC 14
Cityspace

-----
 
Part II:Natural Language Processing
 
-----

◎ 6. LSTM(NLP)

-----

LSTM
http://hemingwang.blogspot.com/2019/09/understanding-lstm-networks.html
https://hemingwang.blogspot.com/2019/09/lstm.html

-----

◎ NNLM

-----

MLE
EM
STM
n-gram

NNLM
https://hemingwang.blogspot.com/2019/04/nnlm.html

C&W
https://hemingwang.blogspot.com/2020/07/c.html

RNNLM
https://hemingwang.blogspot.com/2020/07/rnnlm.html

-----

◎ Word2vec 

-----

Word2vec
https://hemingwang.blogspot.com/2019/04/word2vec.html

Word2vec v1:CBOW and Skip-gram
https://hemingwang.blogspot.com/2020/07/word2vec-v1.html

Word2vec v2:Hierarchical Softmax and Negative Sampling
https://hemingwang.blogspot.com/2020/07/word2vec-v2.html

Word2vec v3:Simplified Word2vec v1 and v2
https://hemingwang.blogspot.com/2020/08/word2vec-v3.html

LSA
https://hemingwang.blogspot.com/2020/07/lsa.html

GloVe
https://hemingwang.blogspot.com/2020/07/glove.html

fastText v1
https://hemingwang.blogspot.com/2020/07/fasttext-v1.html

fastText v2
https://hemingwang.blogspot.com/2020/07/fasttext-v2.html

WordRank
https://hemingwang.blogspot.com/2020/07/wordrank.html

-----

◎ 7. Seq2seq(NLP)

-----

Seq2seq
http://hemingwang.blogspot.com/2019/10/word-level-english-to-marathi-neural.html
https://hemingwang.blogspot.com/2019/09/seq2seq.html

RNN Encoder-Decoder 1
http://hemingwang.blogspot.com/2020/08/rnn-encoder-decoder-1.html

RNN Encoder-Decoder 2
http://hemingwang.blogspot.com/2020/08/rnn-encoder-decoder-2.html

Teacher Forcing 1
http://hemingwang.blogspot.com/2020/08/teacher-forcing.html

Beam Search
http://hemingwang.blogspot.com/2020/08/beam-search.html

Curriculum Learning
http://hemingwang.blogspot.com/2020/08/curriculum-learning.html

-----

BoW 1-gram

Paragraph2vec
https://hemingwang.blogspot.com/2020/08/paragraph2vec.html

B4SE

PSE

-----

Skip-Thought

Quick-Thought

InferSent

MILA SE

Google SE

-----

S3E

SASE

SBERT

RRSE

-----

◎ 8. Attention(NLP)

-----

Attention
http://hemingwang.blogspot.com/2019/10/attention-in-nlp.html
http://hemingwang.blogspot.com/2019/01/attention.html

Attention 1
https://hemingwang.blogspot.com/2020/08/attention-1.html

Visual Attention
https://hemingwang.blogspot.com/2020/08/visual-attention.html

Grad-CAM
https://hemingwang.blogspot.com/2020/08/grad-cam.html

Attention 2
https://hemingwang.blogspot.com/2020/08/attention-2.html

GNMT
https://hemingwang.blogspot.com/2020/08/gnmt.html

-----

NTM

DNC

One Shot MANN

SMA

INTM

-----

MN

DMN

EEMN

KVMN

PN

Set2set

One Shot MN

FSA

-----

◎ 9. ConvS2S(NLP)

-----

ConvS2S
http://hemingwang.blogspot.com/2019/10/understanding-incremental-decoding-in.html 
https://hemingwang.blogspot.com/2019/04/convs2s.html

Key-Value
http://hemingwang.blogspot.com/2019/09/key-value.html

-----

S3L

Context2vec
https://hemingwang.blogspot.com/2020/08/context2vec.html

CoVe

ELLM

ELMo
https://hemingwang.blogspot.com/2019/04/elmo.html

ULMFiT

MultiFiT

-----

◎ 10. Transformer(NLP)

-----

Transformer
http://hemingwang.blogspot.com/2019/10/the-illustrated-transformer.html 
http://hemingwang.blogspot.com/2019/01/transformer.html

GPT-1
https://hemingwang.blogspot.com/2020/01/gpt-1.html

GPT-2

GPT-3

Grover

BERT
https://hemingwang.blogspot.com/2019/01/bert.html

-----

Reformer

LSH

RevNet

Adafactor

-----

Longformer

Synthesizer

Linformer

-----

Part III:Fundamental Topics

-----

Hyper-parameters
https://hemingwang.blogspot.com/2020/09/hyper-parameters.html


-----

◎ 11. Regularization

-----

Regularization

-----

◎ 12. Normalization

-----

PCA
https://hemingwang.blogspot.com/2020/10/understanding-principal-components.html

Normalization
http://hemingwang.blogspot.com/2019/10/an-overview-of-normalization-methods-in.html
https://hemingwang.blogspot.com/2019/10/normalization.html

BN
http://hemingwang.blogspot.com/2019/12/bn.html 

WN
http://hemingwang.blogspot.com/2019/12/wn.html 

LN
http://hemingwang.blogspot.com/2019/12/ln.html 

IN
http://hemingwang.blogspot.com/2019/12/in.html 

AIN
https://hemingwang.blogspot.com/2019/12/ain.html

GN
http://hemingwang.blogspot.com/2019/12/gn.html

PN
http://hemingwang.blogspot.com/2019/12/pn.html

UBN
http://hemingwang.blogspot.com/2019/12/ubn.html 

TUBN
https://hemingwang.blogspot.com/2019/12/tubn.html

ResNet-V
https://hemingwang.blogspot.com/2019/12/resnet-v.html

BNHO
http://hemingwang.blogspot.com/2019/12/bnho.html

URBN
http://hemingwang.blogspot.com/2019/12/urbn.html

NormProp
https://hemingwang.blogspot.com/2019/12/normprop.html 

Efficient Backprop
https://hemingwang.blogspot.com/2019/12/efficient-backprop.html

Whitening
https://hemingwang.blogspot.com/2019/12/whitening.html

GWNN
https://hemingwang.blogspot.com/2019/12/gwnn.html

DBN
https://hemingwang.blogspot.com/2019/12/dbn.html

KN
https://hemingwang.blogspot.com/2019/12/kn.html

IterNorm
https://hemingwang.blogspot.com/2019/12/iternorm.html

-----

-----

◎ 13. Optimization

-----

Optimization(一階)
http://hemingwang.blogspot.com/2019/10/an-overview-of-gradient-descent.html
https://hemingwang.blogspot.com/2018/03/deep-learningoptimization.html
https://hemingwang.blogspot.com/2019/01/optimization.html


Optimization(一階與二階)
https://hemingwang.blogspot.com/2020/10/5-algorithms-to-train-neural-network.html
https://hemingwang.blogspot.com/2020/10/optimization.html

-----

一階:

SGD、Momentum、NAG、
AdaGrad、AdaDelta、RMSProp、
Adam、AdaMax、 Nadam、AMSGrad、
RAdam、SMA、Lookahead、EMA、
AdaBound、SWATS、

LAMB、
CLR、SGDR、AdamW、Super-Convergence、

二階:

1. Gradient Descent, Jacobian, and Hessian、Taylor series and Maclaurin series、
2. Newton's Method、Gauss-Newton Method(Gauss-Newton Matrix)、
3. Conjugate Gradient(Gradient Descent + Newton's Method)、
4. Quasi Newton(Template)、SR1、Broyden(Family)、DFA、BFGS、L-BFGS、
5. Levenberg-Marquardt Algorithm(Gradient Descent + Gauss-Newton Method)
6. Natural Gradient Method(Fisher Information Matrix)、
7. K-FAC-G(Gauss-Newton Matrix)、K-FAC-F(Fisher Information Matrix)、
8. Shampoo v1、Shampoo v2、

三:

ADMM、ADMM-S、dlADMM。

-----

SGD
http://hemingwang.blogspot.com/2019/12/sgd.html 

Momentum
https://hemingwang.blogspot.com/2019/12/momentum.html 

NAG
https://hemingwang.blogspot.com/2019/12/nag.html

-----

AdaGrad
https://hemingwang.blogspot.com/2019/12/adagrad.html

AdaDelta
https://hemingwang.blogspot.com/2019/12/adadelta.html

RMSProp
https://hemingwang.blogspot.com/2019/12/rmsprop.html

-----

Adam
https://hemingwang.blogspot.com/2019/12/adam.html

AdaMax
https://hemingwang.blogspot.com/2019/12/adamax.html

Nadam
https://hemingwang.blogspot.com/2019/12/nadam.html

AMSGrad
https://hemingwang.blogspot.com/2019/12/amsgrad.html

-----

RAdam
https://hemingwang.blogspot.com/2019/12/radam.html

SMA
https://hemingwang.blogspot.com/2019/12/sma.html

Lookahead
https://hemingwang.blogspot.com/2019/12/lookahead.html

EMA
http://hemingwang.blogspot.com/2019/12/ema.html

AdaBound


1. 一階演算法由梯度下降法開展

Gradient Descent, Jacobian, and Hessian

Taylor series and Maclaurin series

2. 二階演算法從牛頓法與高斯牛頓法開始

Newton's Method

Gauss-Newton Method(Gauss-Newton Matrix)

3. 共軛梯度法介於梯度下降法與牛頓法

Conjugate Gradient(Gradient Descent + Newton's Method)

4. 擬牛頓法為牛頓法的運算簡化

Qusai-Newton(Template)

Broyden(Family)

SR1

DFP

BFGS

L-BFGS

5. 萊文貝格-馬夸特方法為梯度下降法與高斯牛頓法的綜合

Levenberg-Marquardt Algorithm(Gradient Descent + Gauss-Newton Method)


6. 自然梯度法

Natural Gradient Method(Fisher Information Matrix)

7. Kronecker-Factored Approximate Curvature

K-FAC-G(Gauss-Newton Matrix)

K-FAC-F(Fisher Information Matrix)

8. 較新較快的二階演算法

Shampoo v1

Shampoo v2

-----

ADMM
https://hemingwang.blogspot.com/2019/12/admm.html

ADMM-S
https://hemingwang.blogspot.com/2019/12/admm-s.html

dlADMM
https://hemingwang.blogspot.com/2019/12/dladmm.html

-----

◎ 14. Activation Function

-----

Activation Function
https://hemingwang.blogspot.com/2019/10/understanding-activation-functions-in.html

Maxout
https://hemingwang.blogspot.com/2019/12/maxout.html

-----

◎ 15. Loss Function

-----

Loss Function
https://hemingwang.blogspot.com/2019/10/a-brief-overview-of-loss-functions-in.html
http://hemingwang.blogspot.com/2019/05/loss-function.html

-----

◎ Pooling

-----


◎ Convolution

-----
 
Convolution
https://hemingwang.blogspot.com/2019/11/convolution.html

-----

◎ Automatic Differentiation

-----

自動微分

-----

◎ Back Propagation

-----

反向傳播

-----

◎ Computational Graph

-----

計算圖

-----

Optimization

Optimization

2020/10/12

-----


https://pixabay.com/zh/photos/stopwatch-gears-work-working-time-3699314/

-----


https://www.neuraldesigner.com/blog/5_algorithms_to_train_a_neural_network

-----



https://blog.slinuxer.com/2016/09/sgd-comparison


Fig. Optimization。

-----



-----




-----





-----



-----




-----


http://www.stat.cmu.edu/~ryantibs/convexopt-F18/lectures/quasi-newton.pdf

-----


https://en.wikipedia.org/wiki/Quasi-Newton_method

-----


https://zh.wikipedia.org/wiki/%E8%8E%B1%E6%96%87%E8%B4%9D%E6%A0%BC%EF%BC%8D%E9%A9%AC%E5%A4%B8%E7%89%B9%E6%96%B9%E6%B3%95

-----





-----

References

◎ 大框架

5 algorithms to train a neural network

https://www.neuraldesigner.com/blog/5_algorithms_to_train_a_neural_network


◎ 一、SGD

SGD算法比较 – Slinuxer

https://blog.slinuxer.com/2016/09/sgd-comparison


An overview of gradient descent optimization algorithms

https://ruder.io/optimizing-gradient-descent/


从 SGD 到 Adam —— 深度学习优化算法概览(一) - 知乎

https://zhuanlan.zhihu.com/p/32626442 


◎ 二、牛頓法與高斯牛頓法

(57) Gauss-Newton algorithm for solving non linear least squares explained - YouTube

https://www.youtube.com/watch?v=CjrRFbQwKLA

4.3 Newton's Method

https://jermwatt.github.io/machine_learning_refined/notes/4_Second_order_methods/4_4_Newtons.html

Hessian Matrix vs. Gauss-Newton Hessian Matrix | Semantic Scholar

https://www.semanticscholar.org/paper/Hessian-Matrix-vs.-Gauss-Newton-Hessian-Matrix-Chen/a8921166af9d21cdb8886ddb9a80c703abe3dde5

牛顿法 高斯牛顿法 | Cheng Wei's Blog

https://scm_mos.gitlab.io/algorithm/newton-and-gauss-newton/

◎ 三、共軛梯度法

Deep Learning Book

https://www.deeplearningbook.org/contents/optimization.html

Blog - Conjugate Gradient 1 | Pattarawat Chormai

https://pat.chormai.org/blog/2020-conjugate-gradient-1

linear algebra - Why is the conjugate direction better than the negative of gradient, when minimizing a function - Mathematics Stack Exchange

https://math.stackexchange.com/questions/1020008/why-is-the-conjugate-direction-better-than-the-negative-of-gradient-when-minimi

◎ 四、擬牛頓法

quasi-newton.pdf

http://www.stat.cmu.edu/~ryantibs/convexopt-F18/lectures/quasi-newton.pdf

Quasi-Newton method - Wikipedia

https://en.wikipedia.org/wiki/Quasi-Newton_method

# 很強的架構

梯度下降法、牛顿法和拟牛顿法 - 知乎

https://zhuanlan.zhihu.com/p/37524275

◎ 五、萊文貝格-馬夸特方法

Optimization for Least Square Problems

https://zlthinker.github.io/optimization-for-least-square-problem

萊文貝格-馬夸特方法 - 維基百科,自由的百科全書

https://zh.wikipedia.org/wiki/%E8%8E%B1%E6%96%87%E8%B4%9D%E6%A0%BC%EF%BC%8D%E9%A9%AC%E5%A4%B8%E7%89%B9%E6%96%B9%E6%B3%95

◎ 六、自然梯度法


◎ 七、K-FAC


◎ 八、Shampoo

-----

Thursday, October 15, 2020

[翻譯] 5 algorithms to train a neural network

[翻譯] 5 algorithms to train a neural network

2020/10/02

-----


https://www.neuraldesigner.com/blog/5_algorithms_to_train_a_neural_network

Fig. 1. 5 algorithms to train a neural network。

-----

The procedure used to carry out the learning process in a neural network is called the optimization algorithm (or optimizer).

用於在神經網路中執行學習過程的過程稱為優化算法(或優化器)。

There are many different optimization algorithms. All have different characteristics and performance in terms of memory requirements, processing speed, and numerical precision.

有許多不同的優化算法。 就內存需求,處理速度和數值精度而言,所有算法都有不同的特性和表現。

In this post, we formulate the learning problem for neural networks. Then, some important optimization algorithms are described. Finally, the memory, speed, and precision of those algorithms are compared.

在這篇文章中,我們制定了神經網路的學習問題。 然後,描述了一些重要的優化算法。 最後,比較了這些算法的內存,速度和精度。

-----


https://www.neuraldesigner.com/blog/5_algorithms_to_train_a_neural_network

Fig. 2. 5 algorithms to train a neural network。

-----

Learning problem.

1. Gradient descent.

2. Newton method.

3. Conjugate gradient.

4. Quasi-Newton method.

5. Levenberg-Marquardt algorithm.

Performance comparison.

Conclusions.

學習問題。

1. 梯度下降。

2. 牛頓法。

3. 共軛梯度。

4. 擬牛頓法。

5. 萊文貝格-馬夸特方法。

性能比較。

結論。

Neural Designer implements a great variety of optimization algorithms to ensure that you always achieve the best models from your data. You can download a free trial here.

Neural Designer 實現了各種各樣的優化算法,以確保您始終從數據中獲得最佳模型。 您可以在此處下載免費試用版。

-----

Learning Problem

-----

The learning problem is formulated in terms of the minimization of a loss index, f. It is a function that measures the performance of a neural network on a data set.

學習問題是根據損耗指數 f  的最小化來表述的。 它是一項測量數據集上神經網路性能的函數。

The loss index is, in general, composed of an error and a regularization terms. The error term evaluates how a neural network fits the data set. The regularization term is used to prevent overfitting by controlling the sufficient complexity of the neural network.

損耗指數通常由錯誤和正則項組成。 誤差項評估神經網路如何擬合數據集。 正則化項用於通過控制神經網路的足夠複雜性來防止過度擬合。

The loss function depends on the adaptative parameters (biases and synaptic weights) in the neural network. We can conveniently group them into a single n-dimensional weight vector w.

損失函數取決於神經網路中的自適應參數(偏置和突觸權重)。 我們可以方便地將它們分組為單個 n 維權重向量 w。

The picture below represents the loss function f(w).

下圖顯示了損失函數 f(w)。

-----


Fig. 3. Loss Function。

-----


Fig. 4. The first derivatives。

-----

As we can see in the previous picture, the minimum of the loss function occurs at the point w∗. At any point A, we can calculate the first and second derivatives of the loss function.

如上圖所示,損失函數的最小值出現在 w * 點。 在任何一點 A,我們都可以計算損失函數的一階和二階導數。

The first derivatives are grouped in the gradient vector, whose elements can be written as ... for i = 1, … , n.

一階導數在梯度向量中分組,對於 i = 1,…,n,其元素可以寫成 ...。

Similarly, the second derivatives of the loss function can be grouped in the Hessian matrix, for i, j = 0, 1, … .

類似地,對於 i,j = 0,1,…,損失函數的二階導數可以分組在 Hessian 矩陣中。

The problem of minimizing the continuous and differentiable functions of many variables has been widely studied. Many of the conventional approaches to this problem are directly applicable to that of training neural networks.

最小化多變量的連續和可微函數的問題已被廣泛研究。 解決該問題的許多常規方法可直接應用於訓練神經網路。

-----

One-dimensional optimization

-----

Although the loss function depends on many parameters, one-dimensional optimization methods are of great importance here. Indeed, they are very often used in the training process of a neural network.

儘管損失函數取決於多參數,但是一維優化方法在這裡非常重要。 確實,它們經常在神經網路的訓練過程中使用。

Many training algorithms first compute a training direction d  and then a training rate η; that minimizes the loss in that direction, f(η). The next picture illustrates this one-dimensional function.

許多訓練算法首先計算訓練方向 d,然後計算訓練速率 η; 從而使該方向上的損耗 f(η)最小。 下一張圖片說明了此一維函數。

-----


Fig. 5. Interval。

-----

The points η1 and η2 define an interval that contains the minimum of f, η∗.

點 η1 和 η2 定義了一個包含 f 的最小值 η∗ 的區間。

In this regard, one-dimensional optimization methods search for the minimum of a given one-dimensional function. Some of the algorithms which are widely used are the golden section method and Brent's method. Both reduce the bracket of a minimum until the distance between the two outer points in the bracket is less than a defined tolerance.

在這方面,一維優化方法搜索給定一維函數的最小值。 廣泛使用的一些算法是黃金分割法和布倫特法。 兩者都會減小區間中的最小值,直到區間中兩個外部點之間的距離小於定義的公差為止。

-----

Multidimensional optimization

-----

The learning problem for neural networks is formulated as searching of a parameter vector w∗ at which the loss function f takes a minimum value. The necessary condition states that if the neural network is at a minimum of the loss function, then the gradient is the zero vector.

神經網路的學習問題被表述為搜索參數向量 w∗,其中損失函數 f 取最小值。 必要條件表明,如果神經網路處於損失函數的最小值,則梯度為零向量。

The loss function is, in general, a non-linear function of the parameters. As a consequence, it is not possible to find closed training algorithms for the minima. Instead, we consider a search through the parameter space consisting of a succession of steps. At each step, the loss will decrease by adjusting the neural network parameters.

損失函數通常是參數的非線性函數。 結果,不可能找到針對最小值的封閉訓練算法。 相反,我們考慮在由一系列步驟組成的參數空間中進行搜索。 在每一步,通過調整神經網路參數,損耗將減少。

In this way, to train a neural network, we start with some parameter vector (often chosen at random). Then, we generate a sequence of parameters, so that the loss function is reduced at each iteration of the algorithm. The change of loss between two steps is called the loss decrement. The training algorithm stops when a specified condition, or stopping criterion, is satisfied.

這樣,為了訓練神經網路,我們從一些參數向量開始(通常是隨機選擇)。 然後,我們生成一系列參數,以便在算法的每次迭代中減少損失函數。 兩步之間的損耗變化稱為損耗減量。 當滿足指定條件或停止標準時,訓練算法停止。

-----

1. Gradient descent

-----

Gradient descent, also known as steepest descent, is the most straightforward training algorithm. It requires information from the gradient vector, and hence it is a first-order method.

梯度下降,也稱為最速下降,是最直接的訓練算法。 它需要來自梯度向量的信息,因此它是一階方法。

-----


Fig.

-----

The parameter η is the training rate. This value can either set to a fixed value or found by one-dimensional optimization along the training direction at each step. An optimal value for the training rate obtained by line minimization at each successive step is generally preferable. However, there are still many software tools that only use a fixed value for the training rate.

參數 η 是訓練率。 該值可以設置為固定值,也可以在每一步沿訓練方向通過一維優化找到。 通常優選在每個連續步驟通過線最小化獲得的訓練速率的最佳值。 但是,仍然有許多軟體工具僅將固定值用於訓練率。

The next picture is an activity diagram of the training process with gradient descent. As we can see, the parameter vector is improved in two steps: First, the gradient descent training direction is computed. Second, a suitable training rate is found.

下一張圖片是梯度下降訓練過程的活動圖。 可以看到,參數向量在兩個步驟中得到了改進:首先,計算梯度下降訓練方向。 第二,找到合適的訓練率。

-----


Fig.

-----

The gradient descent training algorithm has the severe drawback of requiring many iterations for functions which have long, narrow valley structures. Indeed, the downhill gradient is the direction in which the loss function decreases the most rapidly, but this does not necessarily produce the fastest convergence. The following picture illustrates this issue.

梯度下降訓練算法具有嚴重的缺點,即對於具有長而窄的谷底結構的函數,需要進行多次迭代。 確實,下坡是損失函數下降最快的方向,但這並不一定會產生最快的收斂。 下圖說明了此問題。

-----


----

References

[1] 5 algorithms to train a neural network

https://www.neuraldesigner.com/blog/5_algorithms_to_train_a_neural_network

[2] 從梯度下降到擬牛頓法:詳解訓練神經網絡的五大學習算法_機器之心 - 微文庫

https://www.luoow.com/dc_hk/108919053

[3] 從梯度下降到擬牛頓法:詳解訓練神經網絡的五大學習算法 - 每日頭條

https://kknews.cc/zh-tw/tech/p8nq8x8.html

-----

Wednesday, October 14, 2020

Shampoo v2

Shampoo v2

2020/09/25

-----


https://pixabay.com/zh/photos/shampoo-shampoo-bottle-hand-3378336/

Fig. Shampoo.

-----

References

[1] Shampoo

Anil, Rohan, et al. "Second Order Optimization Made Practical." arXiv preprint arXiv:2002.09018 (2020).

https://arxiv.org/pdf/2002.09018.pdf


实用二阶优化 | cf020031308.github.io

https://cf020031308.github.io/papers/2020-second-order-optimization-made-practical/









Shampoo v1

 Shampoo v1

2020/09/28

-----


https://pixabay.com/zh/photos/child-girl-face-bath-wash-foam-645451/

Fig. Shampoo.

-----

References

[1] Shampoo v1

Gupta, Vineet, Tomer Koren, and Yoram Singer. "Shampoo: Preconditioned stochastic tensor optimization." arXiv preprint arXiv:1802.09568 (2018).

https://arxiv.org/pdf/1802.09568.pdf






-----

K-FAC-F

K-FAC-F

2020/09/25

-----


https://pixabay.com/zh/photos/boats-fishers-black-white-fisher-4209935/

Fig. Fisher.

-----

References

[1] K-FAC-F

Martens, James, and Roger Grosse. "Optimizing neural networks with kronecker-factored approximate curvature." International conference on machine learning. 2015.

http://proceedings.mlr.press/v37/martens15.pdf


[2] Introducing K-FAC. A Second-Order Optimization Method for… | by Kazuki Osawa | Towards Data Science

https://towardsdatascience.com/introducing-k-fac-and-its-application-for-large-scale-deep-learning-4e3f9b443414


[3] 入门神经网络优化算法(六):二阶优化算法K-FAC_Bin 的专栏-CSDN博客

https://blog.csdn.net/xbinworld/article/details/105184601


[4] 为什么K-FAC这种二阶优化方法没有得到广泛的应用? - 知乎

https://www.zhihu.com/question/305694880












K-FAC-G

 K-FAC-G

2020/09/28

-----


https://pixabay.com/zh/photos/apple-red-delicious-fruit-256263/

Fig. Gauss-Newton。

-----

References

[1] K-FAC-G

Martens, James, and Ilya Sutskever. "Training deep and recurrent networks with hessian-free optimization." Neural networks: Tricks of the trade. Springer, Berlin, Heidelberg, 2012. 479-535.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.296.4704&rep=rep1&type=pdf

-----


Natural Gradient Method - Fisher Information Matrix

Natural Gradient Method - Fisher Information Matrix

2020/09/25

-----


https://pixabay.com/zh/photos/layer-layered-mountain-bliss-170971/

Fig. Natural Gradient Method

-----

References

◎ 論文

[1]  Natural Gradient Method

Martens, James. "New insights and perspectives on the natural gradient method." arXiv preprint arXiv:1412.1193 (2014).

https://arxiv.org/pdf/1412.1193.pdf

[2] Fisher Information

Ly, Alexander, et al. "A tutorial on Fisher information." Journal of Mathematical Psychology 80 (2017): 40-55.

https://arxiv.org/pdf/1705.01064.pdf

[3] Revisiting natural gradient

Pascanu, Razvan, and Yoshua Bengio. "Revisiting natural gradient for deep networks." arXiv preprint arXiv:1301.3584 (2013).

https://arxiv.org/pdf/1301.3584.pdf

-----

◎ 英文

[4] Natural Gradient Descent - Agustinus Kristiadi's Blog

https://wiseodd.github.io/techblog/2018/03/14/natural-gradient/

[5] Fisher Information Matrix - Agustinus Kristiadi's Blog

https://wiseodd.github.io/techblog/2018/03/11/fisher-information/

-----

◎ 簡中

[6] 自然梯度下降(Natural Gradient Descent) - graycastle - 博客园

https://www.cnblogs.com/zzy-tf/articles/12392507.html

[7] 入门神经网络优化算法(五):一文看懂二阶优化算法Natural Gradient Descent(Fisher Information)_Bin 的专栏-CSDN博客

https://blog.csdn.net/xbinworld/article/details/104591706

[8] 多角度理解自然梯度 - 知乎

https://zhuanlan.zhihu.com/p/82934100

[9] 如何理解 natural gradient descent? - 知乎

https://www.zhihu.com/question/266846405

-----

Levenberg-Marquardt Algorithm

Levenberg-Marquardt Algorithm

2020/10/14

-----


https://pixabay.com/zh/photos/castle-lebenberg-cermes-south-tyrol-1151289/

-----

References

[优化]Levenberg-Marquardt 最小二乘优化 - 知乎

https://zhuanlan.zhihu.com/p/42415718









L-BFGS

L-BFGS

2020/09/30

-----


https://pixabay.com/zh/photos/censorship-limitations-610101/

-----

References

論文

Liu, Dong C., and Jorge Nocedal. "On the limited memory BFGS method for large scale optimization." Mathematical programming 45.1-3 (1989): 503-528.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.110.6443&rep=rep1&type=pdf

英文

Limited-Memory Broyden-Fletcher-Goldfarb-Shanno Algorithm in ML.NET | by Robert Krzaczyński | Towards Data Science

https://towardsdatascience.com/limited-memory-broyden-fletcher-goldfarb-shanno-algorithm-in-ml-net-118dec066ba

簡中

深入机器学习系列17-BFGS & L-BFGS - 知乎

https://zhuanlan.zhihu.com/p/29672873

-----

Quasi Newton

Quasi Newton

2020/10/07

-----


-----

「擬牛頓法還有很多具體算法,這類算法最早是由戴維登(Davidon,WD)於1959年提出的,弗萊徹(Fletcher,R.)和鮑爾(Powell,MJD)於 1963 年給出了後來稱為 DFP 的秩 2 擬牛頓法,布羅依丹(Broyden,CG)於 1965 年給出了秩 1 擬牛頓法。方法的收斂性是 20 世紀 60 年代末到 20 世紀 70 年代才逐漸被證明的.由於這類方法受到廣泛注意,從 20 世紀 60 年代到 20 世紀 70 年代近 20 年中,前後發表了一千多篇文章,提出了很多不同的算法及收斂性證明。

有別於 DFP 和 BFGS 方法,SR1是一種秩-1 更新。它的公式是: 。 SR1 公式不要求矩陣 B_k 保持正定性,從而更逼近真實的 Hesse 矩陣,所以適用於信賴域方法(Trust Region Methods)。

Boyden 族是更廣泛的一類更新公式,其形式為: 。當 ... 時,Broyden 族公式就變成了 BFGS 公式;當 ... 時,Broyden 族公式就變成了 DFP 公式。因此 BFGS 和 DFP 均可看成 Broyden 族的特殊形式或者其中一員。」[2]。

-----

1. Quasi Newton Condition

2. Rank-1 Correction

3. DFP

4. BFGS

5. L-BFGS

[3]。

-----


https://en.wikipedia.org/wiki/Quasi-Newton_method

1. Broyden 1965

2. SR-1

3. DFP 1959 - 1963

4. BFGS

5. L-BFGS

[1]。

-----

DFP

「DFP 算法是以 William C Davidon、 Roger Fletcher 、 Michael J. D. Powell 三個人的名字的首字母命名 的,它由 Davidon 於 1959 年首先提出,後經 Fletcher 和 Powell 加以發展和完善,是最早的擬牛頓法。」

「DFP 變尺度法綜合了梯度法、牛頓法的優點而又避棄它們各自的缺點,只需計算一階偏導數,無需計算二階偏導數及其逆矩陣,對目標函數的初始點選擇均無嚴格要求,收斂速度快,這些良好的性能已作闡述。對於高維(維數大於 50)問題被認為是無約束極值問題最好的優化方法之一。 1. DFP 公式恒有確切解。2. LIFP 法搜索方向的共軛性。3. DFP 算法的穩定性。」

https://baike.baidu.com/item/DFP%E6%B3%95/22735477

-----

References

Hennig, Philipp, and Martin Kiefel. "Quasi-Newton method: A new direction." Journal of Machine Learning Research 14.Mar (2013): 843-865.

https://www.jmlr.org/papers/volume14/hennig13a/hennig13a.pdf


quasi-newton.pdf

http://www.stat.cmu.edu/~ryantibs/convexopt-F18/lectures/quasi-newton.pdf



[1] Quasi-Newton method - Wikipedia

https://en.wikipedia.org/wiki/Quasi-Newton_method

[2] 拟牛顿法_百度百科

https://baike.baidu.com/item/%E6%8B%9F%E7%89%9B%E9%A1%BF%E6%B3%95

DFP法_百度百科

https://baike.baidu.com/item/DFP%E6%B3%95/22735477

[3] 深入机器学习系列17-BFGS & L-BFGS - 知乎

https://zhuanlan.zhihu.com/p/29672873

-----

# 資料整理的不錯

牛顿法和拟牛顿法 - 知乎

https://zhuanlan.zhihu.com/p/46536960


# 很強的架構

梯度下降法、牛顿法和拟牛顿法 - 知乎

https://zhuanlan.zhihu.com/p/37524275



# 直接

牛顿法与拟牛顿法 · Qiyexuxu

http://blog.keeplearning.group/2019/08/16/2019/08-16-newton-method/


# 清楚的呈現

机器学习 · 总览篇 IX - 蔡康的博客 | Kang's Blog

https://kangcai.github.io/2018/12/17/ml-overall-9-algorithm-QNM/


# 阻泥牛頓法,資料整理的不錯

最优化六:牛顿法(牛顿法、拟牛顿法、阻尼牛顿法)_LittleEmperor的博客-CSDN博客

https://blog.csdn.net/LittleEmperor/article/details/105112516