The Star Also Rises

Tuesday, May 23, 2017

AI從頭學（）：Generative Adversarial Nets

AI從頭學（）：Generative Adversarial Nets

2017/05/23

前言：

施工中...

Summary：

Generative Adversarial Nets (GAN) [1] 自2014年推出以來，引 AI 界起很大的熱潮。GAN 的概念，是由 generative net (GN) 跟 discriminative net (DN) 相互對抗，最後 DN 不再能分辨 GN 生成的圖片是真是假，GN 就成功了（能產生以假亂真的圖片）。Adversarial 的觀念是新的，而 generative 跟 discriminative 的觀念則已超過十年 [2]。

有關 GAN 的簡單介紹，可以參考 [3], [4]，較深入的討論，則可參考 [5]-[10]。[11], [12] 則有視覺化的訓練可以參考。

Log likelihood 是學習 GAN 的基礎 [13]-[16]。另外我們可以參考其他的論文來瞭解 GN [17]-[22]。最後則提供徹底掌握 GAN 所需的資料 [23]-[26]。

其實，以上資料並不足以徹底掌握 GAN。Wasserstein GAN [27]-[31] 才是完備的 GAN。而 Kullback–Leibler divergence [32] 與 Jensen–Shannon divergence [33] 算是基礎。

-----

Outline：

1. Formula
2. Generative Net
3. Deep Generative Models

本文重點有三：

1. GAN 公式
2. 生成網路構造
3. 瞭解 GAN 所需之相關資料

-----

Fig. 1.1a. Backpropagate derivatives through generative processes, p. 2 [1].

Fig. 1.1b. Random variable and probability distribution, p. 57 [23].

Fig. 1.1c. Expectation, p. 60 [23].

Fig. 1.1d. Normal distribution, also known as the Gaussian distribution, p. 63 [23].

-----

Fig. 1.2a. D and G play the following two-player minimax game with value function V (G;D), p. 3 [1].

Fig. 1.2b. The model can then be trained by maximizing the log likelihood, p. 2 [1].

Fig. 1.2c. Decomposition into the positive phase and negative phase of learning, p. 608 [23].

Fig. 1.3. Generative adversarial nets are trained by simultaneously updating the discriminative distribution, p. 4 [1].

Fig. 1.4. Minibatch stochastic gradient descent training of generative adversarial nets, p.4 [1].

Fig. 2.1a. DCGAN generator used for LSUN scene modeling, p. 4 [17].

Fig. 2.1b. A 100 dimensional uniform distribution Z, p. 4 [17].

Fig. 2.2. The architecture of the generator in Style-GAN, p. 324 [18].

Fig. 2.3. Text-conditional convolutional GAN architecture, p. 4 [19].

Fig. 2.4. A deconvnet layer (left) attached to a convnet layer (right), p. 822 [20].

Fig. 3.1. Deep generative models, p. vi [23].

Fig. 3.2. Deep learning taxonomy, p. 492 [24].

Fig. 3.3. Chapters 16-19, p. 671 [23].

Fig. 3.4. From section 3.14 to chapter 16, p. 560 [23].

Fig. 4.1. Fully-observed models [6].

Fig. 4.2. Transformation models [6].

Fig. 4.3. Latent bariable models [6].

Fig. 5.1. Probabilistic modeling of natural images, p. 563 [23], p. 8 [26].

Fig. 5.2. An illustration of the slow mixing problem in deep probabilistic models, p. 604 [23].

Fig. 5.3. Positive phase and negative phase, p. 611 [23].

Fig. 5.4. The KL divergence is asymmetric, p. 76 [23].

-----

References

1 GAN

[1] 2014_Generative adversarial nets

[2] 2007_Generative or discriminative, getting the best of both worlds

-----

2 GAN Internet

[3] 生成对抗式网络(Generative Adversarial Networks) – LHY's World
http://closure11.com/%E7%94%9F%E6%88%90%E5%AF%B9%E6%8A%97%E5%BC%8F%E7%BD%91%E7%BB%9Cgenerative-adversarial-networks/

[4] 能根據文字生成圖片的GAN，深度學習領域的又一新星 GigCasa 激趣網
http://www.gigcasa.com/articles/465963

[5] 深度学习与生成式模型 - Solomon1558的专栏 - 博客频道 - CSDN.NET
http://blog.csdn.net/solomon1558/article/details/52512459

[6] 生成式对抗网络GAN研究进展（一） - Solomon1558的专栏 - 博客频道 - CSDN.NET
http://blog.csdn.net/solomon1558/article/details/52537114

[7] 生成式对抗网络GAN研究进展（二）——原始GAN - Solomon1558的专栏 - 博客频道 - CSDN.NET
http://blog.csdn.net/solomon1558/article/details/52549409

[8] 生成式对抗网络GAN研究进展（三）——条件GAN - Solomon1558的专栏 - 博客频道 - CSDN.NET
http://blog.csdn.net/solomon1558/article/details/52555083

[9] 生成式对抗网络GAN研究进展（四）——Laplacian Pyramid of Adversarial Networks，LAPGAN - Solomon1558的专栏 - 博客频道 - CSDN.NET
http://blog.csdn.net/solomon1558/article/details/52562851

[10] 生成式对抗网络GAN研究进展（五）——Deep Convolutional Generative Adversarial Nerworks，DCGAN - Solomon1558的专栏 - 博客频道 - CSDN.NET
http://blog.csdn.net/solomon1558/article/details/52573596

[11] An introduction to Generative Adversarial Networks (with code in TensorFlow) – AYLIEN
http://blog.aylien.com/introduction-generative-adversarial-networks-code-tensorflow/

[12] Adverarial Nets
http://cs.stanford.edu/people/karpathy/gan/

-----

3 log likelihood

[13] 2009_Deep Boltzmann machines

[14] Likelihood function - Wikipedia
https://en.wikipedia.org/wiki/Likelihood_function

[15] 1.4 - Likelihood & LogLikelihood _ STAT 504
https://onlinecourses.science.psu.edu/stat504/node/27

[16] Chapter 18 Confronting the Partition Function
http://www.deeplearningbook.org/contents/partition.html

-----

4 Generator

[17] 2016_Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

[18] 2016_Generative image modeling using style and structure adversarial networks

[19] 2016_Generative adversarial text to image synthesis

[20] 2014_Visualizing and understanding convolutional networks

[21] 2011_Adaptive deconvolutional networks for mid and high level feature learning

[22] 2016_A guide to convolution arithmetic for deep learning

-----

5 Goodfellow

[23] 2016_Deep Learning
https://github.com/HFTrader/DeepLearningBook/raw/master/DeepLearningBook.pdf

[24] 2016_Practical Machine Learning

[25] 2009_Learning multiple layers of features from tiny images

[26] 2011_Unsupervised models of images by spike-and-slab RBMs

-----

6 Goodfellow

[27] 2016_NIPS 2016 Tutorial, Generative Adversarial Networks

-----

7 Wasserstein GAN

[28] 令人拍案叫绝的Wasserstein GAN - 知乎专栏
https://zhuanlan.zhihu.com/p/25071913

[29] 生成式对抗网络GAN有哪些最新的发展，可以实际应用到哪些场景中？ - 知乎
https://www.zhihu.com/question/52602529/answer/158727900

[30] 2017_Towards principled methods for training generative adversarial networks

[31] 2017_Wasserstein GAN

[32] 2017_ Improved training of Wasserstein GANs

[33] Kullback–Leibler divergence - Wikipedia
https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence

[34] Jensen–Shannon divergence - Wikipedia
https://en.wikipedia.org/wiki/Jensen%E2%80%93Shannon_divergence

GAN Warm Up 1 - Log Likelihood

GAN Warm Up 1 - Log Likelihood

2017/03/30

前言，GAN 整個寫出來會很長，所以先暖身一下，以免受到運動傷害！

-----

我看版上的同好對 GAN [1] 都蠻感興趣的，當然我自己也是。論文其實不容易看懂，在查閱一些網路文章的解說之後，原理不難瞭解。鑑別網路D要判斷生成網路G的圖片是不是屬於真實的圖片資料庫。

用個很宅的比喻：比克跟神本來就是同一個。一會兒比克出招，神餵招。一會兒神出招，比克餵招。在一連串的練功之後，神再也分辨不出比克生成的圖片是真是假，比克贏了，Goodfellow 就達成了他的目的，讓比克成為超級的偽幣製造者，神退場。

鑑別網路D可以是一般的CNN，生成網路G概念上比較困難。但直觀上只是把鑑別網路D倒過來訓練，餵入亂數，生成圖片。文獻上看到的多是一個一百維的亂數產生器。所以重點還是在訓練。

[1] 裡面有兩個公式，分別是圖1.1a跟圖1.2a。這兩個公式是本篇論文的核心。謹尊Mark的教誨，看到公式不要跳過去，要直指核心，才能花開見佛。所以在上面的概念粗具之後，我們可以來仔細探究這兩個公式為何？

打開統計學課本，其實符號與論文並不一致。後來想到 [2] 裡面有機率的章節，所以可以把統計部分的定義搞定。至於極限與梯度，請自行參閱微積分課本。圖1.1a可以用圖1.1b、1.1c、1.1d把它分解，完全瞭解。分別是隨機變數、期望值函數、以及高斯（常態）分布。

-----

Fig. 1.1a. Gradient, p. 2, [1]

Fig. 1.1b. Random variable, p. 57, [2].

Fig. 1.1c. Expectation function, p. 60, [2].

Fig. 1.1d. Gaussian distribution, p. 63, [2]

-----

接下來是本論文的重頭戲，圖1.2a的訓練公式。主旨要max D, min G。有了上面的定義分解，這個公式只剩下 log 這個可疑份子。如何搞定它，其實論文這一節上頭一開始，圖1.2b就告訴我們可以參考 [3]。有關 log likelihood，可以看一下圖1.2c，在 [2] 的18.1這一節用了兩頁解釋。

-----

Fig. 1.2a. GAN training formula, p. 2, [1].

Fig. 1.2b. Log likelihood reference 1, p. 2, [1].

Fig. 1.2c. Log likelihood reference 1, p. 606, [2].

-----

所以如果 log likelihood 搞定，GAN 應該也差不多就搞定了！

有這麼簡單嗎？其實我認為 GAN 算是 Goodfellow 的 Debut，為了讓大家了解 [1]，他才寫了 [2]，GAN 躲在 [2] 的第二十章 DEEP GENERATIVE MODELS 裡的第十節的 Deep Generative Nets 的第四小節 Generative Adversarial Neworks 裡，但整本書，包括前面的數學、中間基本的章節、後面高段的模型，都可說是用來烘托 GAN 的出場！

所以，GAN 可說不難，也可說很難。

本段熱身，報告完畢！

-----

References

[1] 2014_Generative adversarial nets

[2] 2016_Deep Learning
http://www.deeplearningbook.org/

[3] 2009_Deep Boltzmann machines

-----

Marcel Wang： http://www.cs.toronto.edu/....../Deep_generative_models.pdf

李盈： https://nips.cc/Conferences/2016/Schedule?showEvent=6202NIPS 2016 tutorial 也有詳細的介紹包含generative model有甚麼用途、如何work、和一些研究。論文版：https://arxiv.org/abs/1406.2661

Marcel Wang： http://aliensunmin.github.io/project/accv16tutorial/media/generative.pdf

Friday, May 19, 2017

AI 從頭學（二七）：ZFNet

AI 從頭學（二七）：ZFNet

2017/05/19

前言：

學習 DL 的路上充滿幸運：

第一個是建議實做 LeNet。然後才有 LeNet實作團。

第二個是 Jason 辦了讀書會一起研讀 Goodfellow 以及 Sutton 的教科書，雖然時間受限沒辦法參加讀書會，但是瀏覽過這兩本書後，對 DRL 的架構清楚很多。

第三個是 Winston 推薦了兩篇論文，還讓他的學生幫忙指導以及討論。現在 DRL實作團也誕生了！

答謝的最好方式是散播幸運...

-----

Summary：

AlexNet [1] 的 feature maps 怎麼 show 出來，[2] 有詳細的說明。[3] 也展示了最後一層的 feature maps。值得一提的是 [3] 的架構被 GoogLeNet [4] 所吸收。

本篇只簡單描述研讀 kernel visualizing 的經驗。

-----

Fig. 1.1. An illustration of the architecture of AlexNet [1].

圖1.1 是 AlexNet 的架構，大體上可以想成是巨型的 LeNet。之所以分成兩路，其中一個原因是硬體的限制，要拆開分別在兩塊 GPU 上面跑。

Fig. 1.2. 96 convolutional kernels learned by the first convolutional layer. The top 48 kernels were learned on GPU 1 while the bottom 48 kernels were learned on GPU 2 [1].

圖1.2 是 conv layer 1 的 feature maps，上面可以看到是一些 edge filters。下面則跟色彩有關。這兩者構成一張圖。

關於這張圖，我的好奇心是：線條跟彩度是如何被分開在兩塊 GPU，隨機的嗎？

Fig. 2.1a. Architecture of revised AlexNet model [2].

圖2.1a 是 AlexNet 修改過的架構。主要是靠窮舉法去把適合的參數找出來。

Fig. 2.1b. A deconvnet layer (left) attached to a convnet layer (right) [2].

圖2.1b 展示了 feature maps 如何利用 deconvolution 「還原」成可以「看」的圖。除了最底層的 edge filter 比較直觀之外，高層的 feature maps 大部分是看不出東西的。說「大部分」，是因為有些最高層的 feature maps 可以看出物體的形狀，參考 [3]。

Fig. 2.2a. Visualization of features in a fully trained model [2].

圖2.2a 是 Layer 1 跟 Layer 2 的視覺化 features。經還原後可以看出 Layer 2 已經有比直線更複雜一點的結構。

Fig. 2.2b. (a): 1st layer features [1]. (b): 1st layer features [2]. (c): 2nd layer features [1]. (d): 2nd layer features [2].

圖2.2b 比較 AlexNet 跟改良版有何不同。改良版的圖「漂亮」一點。所謂漂亮，這裡指的是彩度跟解析都提高了！意味著訊息被保留較多。

Fig. 2.2c. Visualization of features in a fully trained model [2].

圖2.2c 是最高層的 feature maps 「被還原」，可以看到跟原圖很接近的形狀。

Fig. 2.2d. Three test examples [2].

圖2.2d，(a)是輸入的圖片。(b)是熱圖：高強度的數值以紅色表示，低強度的數值則是以藍色來表示。(c)是利用 deconvolution 把 feature maps 還原成類似原圖。(d)是這個圖是「對」的圖的機率高不高。(e)是這個圖最可能是哪一種。機率圖可參考 [6]。

Fig. 3.1a. Comparison of linear convolution layer and mlpconv layer [3].

圖3.1a，左邊是傳統的 convolutional layers，右邊可以看到插進去兩層的全連接層。

Fig. 3.1b. The overall structure of Network In Network [3].

圖3.1b，NiN的架構。

Fig. 3.2a. Visualization of the feature maps from the last mlpconv layer [3].

圖3.2a是接近輸出層的 feature maps。

Fig. 3.2b. Visualization of the automobile feature map from the last mlpconv layer [3].

Fig. 3.2c. Visualization of the automobile feature map from the last mlpconv layer [3].

汽車是最明顯的！

-----

結論：

要看中間的 feature maps，可以利用 deconvolution [2]。

-----

References

[1] 2012_Imagenet classification with deep convolutional neural networks

[2] 2014_Visualizing and understanding convolutional networks

[3] 2013_Network in network

[4] 2015_Going deeper with convolutions

[5] Heat map - Wikipedia
https://en.wikipedia.org/wiki/Heat_map

[6] Handwritten Digit Recognition Using A Naive Bayes Classifier Trained With MNIST _ Me, Myself and AI
https://gettingintoai.wordpress.com/2014/09/01/firstmlexperience/

Wednesday, May 17, 2017

AI 從頭學（二四）：CNN - Kernel Visualizing

AI 從頭學（二四）：CNN - Kernel Visualizing

2017/05/17

上次其實是連續提問兩個問題 [5], [6]。第二個問題是：繼續推廣下去，LeNet-5 用六個，大家跑出來的 kernel 值會非常接近，都很像標準的 edge filters 嗎 [6]？

目前還不知道。

不過關於 filters 的長相，台灣大學的徐宏民教授推薦了兩篇論文 [2], [3]。[2] 是在 AlexNet [7] 的架構上繼續推廣。另外 GoogLeNet [8] 則是架構在 [3] 之上。AlexNet 與 GoogLeNet 是目前巨型 CNN 的兩大主流。

[2] 跟 [3] 之前已經簡略看了一下，我想還是另外撰寫，不要增補在這一篇裡。

-----

2017/03/30

有關 filter training，再請教一個問題。依照上回的討論，以及低階 filter 的原理，如果使用八個 filters / kernels，每個人訓練出來的結果應該都會一樣嗎？跟本文圖一 [1]，Filter bank (to be learned) 這八個 filters 差不多。

繼續推廣下去，LeNet-5 用六個，大家跑出來的 kernel 值會非常接近，都很像標準的 edge filters 嗎？

-----

也謝謝 Winston Hsu 與李盈的指導 [2]-[4]！

-----

Fig. 1. Edge filters, [1].

-----

5則留言：

（一）

Winston Hsu： The more filters (in the same layer) are the better. In generally, the filters will collaboratively learn the masks which respond for different aspects for the images.

Personally, I will strongly recommend the following paper to understand CNN -- even more readable than AlexNet.

Matthew D. Zeiler, Rob Fergus: Visualizing and Understanding Convolutional Networks. ECCV (1) 2014: 818-833

There are also couple tools which help you visualize filters (also referring the prior paper) of different layers. With that, you can understand how CNN works.

Another work worthy to investigate is NiN, Network In Network. You can try to visualize their last layer, the feature map of the last layer, which will roughly correspond to the object location.

You can start with Caffe and easily find their models (defined in prototext in Caffe Model Zoo). It is very easy to modify and finetune with the dataset.

Marcel Wang： What a wonderful comment!

Winston Hsu： Li Ying, you can try to help them in the community. She is my MS student.

Steve Yeh：Thanks for Winston's valuable comment, will start with these material.

顏志翰：李盈（Li Ying）welcome, 讀書會的同學！

-----

（二）

Marcel Wang：我先自問自答一下好了。根據我目前對GAN的理解，假設樣本夠大，那訓練出來的結果，應該都會差不多。而且根據之前視覺皮層的研究，最基本的 filters 就是專門辨別不同旋轉角度的線條。

李盈： GAN ? 您是指Generative Adversarial Nerworks嗎?

Marcel Wang：是啊！

李盈：根據我對GAN的理解，因為它的目的其實是"generate" 應該不會有樣本夠大，結果就差不多的現象。不過我想我需要再多看一些資料才能確定我的觀點是否正確

Marcel Wang：生成與分解其實是一體的兩面。所以CNN提到GAN也不能算是離題。先有一個夠大的真實資料庫，那訓練出來的generator才能生成類似真實的圖片（或音樂等）。圖本來就是線條構成。生成圖片，filters產生edges。CNN要分析圖片，也是先把圖片拆成edges。

李盈：好像真的有data夠多，結果就能趨近一致的現象： NIPS tutorial video中有提到：GAN有Asmptotically consistent的特性："if you are able to find the equilibrium point of the game defining a generative adversarial network, you've actually recovered the true distribution that generates the data, modulo sample coplexity issues." 這裡：(https://github.com/....../blob/master/magenta/reviews/GAN.md) 也提到了："....Under certain conditions, this process reaches a fixed point where the generator has learned the true data distribution, and hence the discriminator cannot classify real examples from generated ones. "

-----

（三）

Marcel Wang：所以如果有六個 filters，那每個 edge detector 之間應該是60度，八個的話，就是45度。不同的人訓練出來的結果可能會有相位旋轉的偏移。但如果水平跟垂直線比較多，那有可能偏移會被鎖定，大家的結果就趨於一致。不過這純粹是想像，實際結果，應該要真的去跑一下資料庫才可確定。

Marcel Wang：「只要」把LeNet-5的第一層分別用5、6、7、8個filters跑一遍，然後看一下它們的長相，其實就知道結果了！

-----

Referencs

[1] [DL] Convolutional Neural Network ? 不要停止思考
http://solring-blog.logdown.com/posts/302641-dl-convolutional-neural-network

[2] 2014_Visualizing and understanding convolutional networks

[3] 2013_Network in network

[4] magenta_GAN.md at master · tensorflow_magenta · GitHub
https://github.com/tensorflow/magenta/blob/master/magenta/reviews/GAN.md

[5] 我對實作 CNN，filter 的選擇很有興趣，有人願意討論一下嗎？
https://www.facebook.com/groups/Taiwan.AI.Group/permalink/1803896623266104/

[6] 繼續推廣下去，LeNet-5 用六個，大家跑出來的 kernel 值會非常接近，都很像標準的 edge filters 嗎？
https://www.facebook.com/groups/Taiwan.AI.Group/permalink/1804518016537298/

[7] 2012_11554_Imagenet classification with deep convolutional neural networks

[8] 2015_3089_Going deeper with convolutions

AI 從頭學（二三）：CNN - Kernel Training

AI 從頭學（二三）：CNN - Kernel Training

2017/05/17

關於 LeNet [2]，我看了好幾個月，一直無法實作，其中之一是因為關鍵的 filter / kernel 不知如何設定。這個算是被之前影像處理的概念限制住了。總之，在偶然的機會發問後，瞭解到原來 filter / kernel 的初值是隨機指定（uniform distribution），後面再用 BP 調整。

3/30 獲得解答。接下來是 4/12 LeNet 實作團開動 [5]。前幾天有同學完成期末作業。為期一個月左右的 LeNet 實作團暫時告一段落！我原本的希望是可以比較不同人訓練出來的 filter / kernel 會不會差不多，理論上應該是，但沒看到，總是遺憾！

回過頭來再看論文 [6]，其實很快就要來台灣的作者是有提到的：

C1 contains 156 trainable parameters.

(5 x 5 + 1) x 6 = 156.

-----

2017/03/30

我對實作 CNN，filter 的選擇很有興趣，有人願意討論一下嗎？

Jason Tsai： Filter (kernel, 也就是 weight) 裡面的值(設定)是訓練出來的，不要跟一般的 filter 概念搞混了!

（size, number 可以選擇）

-----

Fig. 1. Filter / Kernel Training [1].

-----

起因是看了一位朋友的教學文章，主題是使用 Tensorflow 實作卷積神經網路 [1]。由於之前我研究 LeNet 時，查到的參考文獻都沒提到 filter 的長相 [2]，所以就到 FB 的 AI 社團裡發問 [3]，沒想到一下子就獲得解答了。AI 社團也是一種 AI 網絡啊！

回頭又看了一下 convolution 的計算 [4]，雖然 filter / kernel 要設定初值，但卷積核的元素，以 5x5 為例，其實就是 25個 weights。這些 weights 在神經網路裡是要被 BP 訓練的，也就是 Jason 所說的，filter / kernel / weight 是訓練出來的。

感謝所有參加討論的社團成員！

-----

5則留言：

（一）

Bi-Ruei Chiu：從前修過數位影像處理，看到那個 filter 第一個想到的是二維的濾波器，只是從圖示中的參數，看不太出來它是高通濾波器、低通濾波器、還是 edge detection 用的 sober filter ......

如果用 matlab 畫一下它的二維頻率響應，或許可以看出一點端倪。

其實當看到 convolution layer 的時候，我想到都是像傳統影像處理的東西，例如：2D-filtering、edge detection、histogram equalization、 filter bank... 又被 "偷偷" 放回神經網路裡。

二十多年前 P.P.V. 和林源倍教授有一篇 2D filter bank review 或許可以從中間得到一點靈感。

http://web.it.nctu.edu.tw/~ypl/Publication/A/5.pdf

-----

（二）

林暐翔：請問是指filter size的選擇嗎？

Marcel Wang： type的選擇

Jason Tsai： Weight (filter/kernel) 是 train 出來的

-----

（三）

Eric Yang：我想問一個問題，為什麼這套東西要叫做Tensor Flow 跟我認識的Tensor 和 Flow 都沒有關係的樣子。。。

Bi-Ruei Chiu： IMHO, Tensor -> 底層有一堆矩陣和向量內積外積 blah...blah... 的運算......

Eric Yang：我只看到conv

Eric Yang：這比微分幾何還弱。。

Jason Tsai： "Tensor" 這裡指的是三軸／維度(含)以上的 data array。TensorFlow 借用了 directed computational graph (有向計算圖) 來規劃數值計算，代表 data 的計算 "Flow"。

Eric Yang：有點感覺了多謝指點

Eric Yang： Bi-Ruei Chiu 多謝

-----

（四）

：

Marcel Wang：目前我最有興趣的是CNN主要使用哪些filters，以及這些filters孰優孰劣，其背後機制為何？

：

Marcel Wang：無法證明嗎？也許只是還沒人做而已。

魏澤人：有關 DNN 的正確性的一個面相，有一篇，可以參考 https://www.facebook.com/0DHARMA0/posts/725203530984322

Jason Tsai： Filter (kernel, 也就是 weight) 裡面的值(設定)是訓練出來的，不要跟一般的 filter 概念搞混了!

Marcel Wang： Jason Tsai 所以只有 size ，沒有 type ?

：

Jason Tsai： The important thing about kernel itself is its size and the number in each convolution layer.

：

Marcel Wang： 7x7, 5x5, 都可分解成 3x3

：

顏志翰： hello, Rot(W_{i,j}= W_{-i,-j} 就我的理解應該是因為kernel的最中間當(0,0) 其餘做shift，所以左上(-1,-1)旋轉180度就到(1,1的位置)，應該是這樣吧!

：

顏志翰：就好像在影像處理中，對convolution的推導都會先假設mxn filter 中m,n為奇數，比較方便推導summation，所以實現上遇到偶數就要另外計算了！！我沒像你們這麼厲害，實際代數字手推這麼多case！聽起來好複雜！

Jason Tsai： Kernel size 並不見得愈小愈好，過小會有特徵不足的問題。

-----

（五）

李昶輝： YOLOv2 Object Detection 把最後一層 cnn feature map 視覺化的結果供參考 https://youtu.be/vw1R5JXvXi0

Marcel Wang：這個厲害！

Jason Tsai：https://aiexperiments.withgoogle.com/visualizing-high-dimensional-space

汪楚剛：這個問題每半年就會有人問一次，上次是我，也是jason出來解答的，好像是在Tensorflow Taiwan的社團

汪楚剛：另外，你可以參考DeepLearningBook的Chapter 9.9 Random or Unsupervised Features，會有一些意外的收穫

-----

References

[1] 阿布拉機的3D列印與機器人深度學習(2)--使用Tensorflow實作卷積神經網路(Convolutional neural network，CNN)
http://arbu00.blogspot.tw/2017/03/2-tensorflowconvolutional-neural.html

[2] AI從頭學（一二）：LeNet
http://hemingwang.blogspot.tw/2017/03/ailenet.html

[3] 我對實作 CNN，filter 的選擇很有興趣，有人願意討論一下嗎？
https://www.facebook.com/groups/Taiwan.AI.Group/permalink/1803896623266104/

[4] 2002_Tutorial on Convolutions – Torch

[5] LeNet實作團（目錄）
http://hemingwang.blogspot.tw/2017/04/lenet.html

[6] 1998_Gradient-Based Learning Applied to Document Recognition

Tuesday, April 25, 2017

FB社團：AI

Monday, April 24, 2017

AI從頭學（二二）：Azure Machine Learning - Clustering

AI從頭學（二二）：Azure Machine Learning - Clustering

2017/04/24

前言：

本文跟 ML（AI的一支）、Azure、Python 有關，將同步發表在 FB 相關社群。

-----

Summary：

Machine Learning (ML) 基本上屬於 AI 的一支 [1]。本文主要試圖透過 Python 使用 Azure 的 ML 服務 [2], [3]。本次將介紹 Clustering 的 k-means algorithm [4], [5]。

-----

Outline：

1: Introduction to Data Science
2: Introducing Microsoft Azure Machine Learning
3: Data Preparation
5: Integration with Python
6: Introduction to Statistical and Machine Learning Algorithms
- Clustering Algorithms

-----

在進入主題之前，我們先介紹 [2] 這本書的架構，參考圖1.1。

前面兩章是簡介。第三章是資料預處理，這一章雖然沒有圖，但很重要。第四章是與 R 的連接，雖然重要，但後續有用到時才會介紹。第五章是透過 Python 使用 Azure 的 ML 服務，這是本次的重點之一。在第六章有各種 ML 演算法，都很重要，本次介紹 Clustering 的 k-means algorithm。

另外第十二章是 recommendation systems，由於這個工作上也有需求，將會在下次專文介紹。

Fig. 1.1. Azure Machine Learning [2].

-----

在進入主題之前，我們還是先簡單介紹一下 ML 的流程，參考圖1.2。首先資料會預處理，然後分成兩群，一群用來訓練 Model，當 Model 訓練完畢，另外一群用來測試。最後評估 Model 是否合用。圖1.3 用到 R，請自行參考。

資料預處理，也可以在跑完後重新做。參考圖2.1，當我們把 dayWeek 這一項移除後，整體的解析（相關度的呈現），就提高了，參考圖2.2。

Fig. 1.2. A generalized model training workflow for Azure ML models, p. 5 [3].

Fig. 1.3. Workflow for an R model in Azure ML, p. 7 [3].

Fig. 2.1. Plot of correlation matrix, p. 16 [3].

Fig. 2.2. Plot of correlation matrix without dayWeek variable, p. 16 [3].

-----

1: Introduction to Data Science

首先定義資料科學，參考圖3.1。廣義的資料科學可包括數學、信號處理、機器學習、科學計算、統計學、作業研究、程式、資料庫、以及語言學。

資料分析可分為四個層次，參考圖3.2。第一是描述，第二是診斷，第三是預測、第四是規範。

描述的話，以分群演算法為例，它將資料自動分成幾群。

診斷的部分先跳過。

預測是重點，這裡有幾個演算法：線性迴歸、邏輯迴歸、神經網路、支持向量機，都是用來預測的。

最後是規範，這裡舉出的例子是 Monte Carlo，這個方法，AlphaGo 也有用到 [6]。

圖3.3舉出一個企業應用的實例。分別用 Churn Model、Segmentation Model、以及 Propensity Model 來幫助企業找出有問題的客戶跟高價值的客戶。這本分別在第九章、第十章、第七章 [2]，未來有機會將會一一介紹。

圖3.4是資料科學的循環圖，大體上就是圖1.2不斷循環。

Fig. 3.1. Highlighting the main academic disciplines that constitute data science, p. 4 [2].

Fig. 3.2. Spectrum of all data analysis, p. 5 [2].

Fig. 3.3. A smart telco using prescriptive analytics, p. 6 [2].

Fig. 3.4. Overview of the data science process, p. 13 [2].

-----

2: Introducing Microsoft Azure Machine Learning

在進入 Azure 之前，我們先來看一下 ML，以 Clustering 為範例，分群後配合特徵，這裡是年齡跟玩 game 的強度，則可界定潛在的消費者或客戶，參考圖4.1。

Azure 對於 ML，有提供圖像化的服務，能把圖1.2的演算法，在螢幕上實現，參考圖4.2。訓練好的模型，可以存起來，參考圖4.3，也可以發佈，參考圖4.4。

Fig. 4.1. Simple hypothetical customer segments from a clustering algorithm, p. 15 [2].

Fig. 4.2. Regression Model experiment, p. 37 [2].

Fig. 4.3. The experiment that uses the saved training model, p. 39 [2].

Fig. 4.4. A dialog box that promotes the machine learning model from the staging server to a live production web service, p. 40 [2].

-----

3: Data Preparation

資料預處理很重要，而且一定會遇到，這裡簡單舉幾個例子：

1. 資料不見了。
2. 資料的值是空的。
3. 有重複的資料。
4. Outliers （跟大部分資料值差距太大的）。
5. 正規化。書中舉出幾個方法可以使用：Zscore、MinMax、Logistic、LogNormal、Tanh。也許這個觀念一時不容易，但這個老師調分數就可以活用了！

-----

5: Integration with Python

要達到客製化的需求，最好還是自己 coding。

圖5.1a是用 Python 存取一個 Azure 上的資料檔，並將其值呈現在螢幕上，圖5.1b則是透過 numpy 與 matplotlib 將資料圖像化。

圖5.2是一個完整的 Python 程式，可以存取兩個資料檔，並對其運算。參考圖5.3。

Fig. 5.1a. Viewing the content of the dataframe, p. 114 [2].

Fig. 5.1b. Using the pandas dataframe plot method, p. 115 [2].

Fig. 5.2a. Complete Python Code for the Experiment, p. 125 [2].

Fig. 5.2b. Complete Python Code for the Experiment, p. 126 [2].

Fig. 5.3. Using the Execute Python Script, p. 128 [2].

-----

6: Introduction to Statistical and Machine Learning Algorithms
- Clustering Algorithms

本節講解 Clustering 的 k-means algorithm。

以圖6.1a為例，這裡有三群，但是如何寫程式將這三群分開來呢？

1. 如果要分成三群，首先要指定三個群的中心，這個中心可以是隨機指定。

2. 第二個步驟，計算每個點到這三個中心的距離，距離最短者，則屬於這個中心這一群。

3. 群分好了，接下來重新計算這三個群的中心。

4. 重複步驟2跟3。一直到最後，或者誤差很小。

可以參考圖6.1b、6.3a、6.3b的演算法。

圖6.3c是 Python 的實現，原則上只要設定相關的參數即可。

Fig. 6.1a. Dataset for k-means clustering, p. 147 [2].

Fig. 6.1b. Iterations of the k-means clustering algorithm with k=3 in which the cluster centroids are moving to minimize error, p. 147 [2].

Fig. 6.2. The k-means clustering algorithm, p. 335 [4].

Fig. 6.3a. The k-means clustering algorithm, p. 313 [5].

Fig. 6.3b. within-cluster sum of squared errors (SSE), p. 314 [5].

Fig. 6.3c. KMeans class from scikit-learn's cluster module, p. 314 [5].

-----

Clonclusion：

所以，在 Azure 上跑 ML 沒有很難吧！：）

-----

References

[1] AI從頭學（二一）：A Glance at Deep Reinforcement Learning
http://hemingwang.blogspot.tw/2017/04/aia-glance-at-deep-reinforcement.html

[2] 2015_Predictive analytics with Microsoft azure machine learning, build and deploy actionable solutions in minutes

[3] 2015_Data Science in the Cloud with Microsoft Azure Machine Learning and R

[4] 2016_Practical Machine Learning

[5] 2015_Python Machine Learning

[6] AI從頭學（一一）：A Glance at Deep Learning
http://hemingwang.blogspot.tw/2017/02/aia-glance-at-deep-learning.html

Tuesday, January 03, 2017

AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306) For many companies, recommendation systems solve important machine learning problems.…

The Star Also Rises

Tuesday, May 23, 2017

AI從頭學（）：Generative Adversarial Nets

GAN Warm Up 1 - Log Likelihood

Friday, May 19, 2017

AI 從頭學（二七）：ZFNet

Wednesday, May 17, 2017

AI 從頭學（二四）：CNN - Kernel Visualizing

AI 從頭學（二三）：CNN - Kernel Training

Tuesday, April 25, 2017

FB社團：AI

Monday, April 24, 2017

AI從頭學（二二）：Azure Machine Learning - Clustering

Tuesday, January 03, 2017

AWS re:Invent 2016: Using MXNet for Recommendation Modeling at Scale (MAC306)

Programmer

Blog Archive

Labels

Recent Comments

My Blog List

MY LINKS

status

About Me