2015-09-27

Pythonで実装 PRML 第4章パーセプトロンアルゴリズムによる分類

PRML Python 機械学習

前回に続いてパターン認識と機械学習、第4章「線形識別モデル」の図4.7の再現です。

この図はパーセプトロンアルゴリズムでの分類が収束する様子を表したもので、右下の図で収束完了となってます。本当は特徴空間 $(\phi_1,\phi_2)$ 上での話なのですが、作図の都合上、 $\phi({\bf x}) = {\bf x}$ となる恒等関数ということにしてしまってます。

実装の大まかな流れ

①求めたい識別境界は(4.52)

$y({\bf x}) = f ({\bf x}^{\bf T} \phi({\bf x}))(4.52)$

②(4.54)の誤差を最小にするパラメータ $w$ を確率的最急降下法で(4.55)を更新しながら求める。

$E_p({\bf w}) = -\sum_{n=1}^N {\bf w}^{\bf T} \phi({\bf x}_n){\bf t}_n(4.54)$

${\bf w}^{\rm \tau+1} = {\bf w}^{\rm \tau} - \mu \nabla E_p({\bf{w}}) = {\bf w}^{\rm \tau}- \mu \phi({\bf x}_n) {\bf t}_n (4.55)$

コード

import matplotlib.pyplot as plt
from pylab import *
import numpy as np
import random
%matplotlib inline

def func(train_data, k):
    def perceptron(train_data, k):
        x = 0
        w = array([1.0,1.0, 1.0])
        eta = 0.1
        converged = False
    
        while not converged:
            x += 1
            if x == k:
                converged = True
                
            for i in range(N):
                w_t_phi_x = np.dot(w, [1,train_data[i,0], train_data[i,1]])
                
                if w_t_phi_x > 0:
                    d = 1
                else:
                    d = 2
             
                if d == train_data[i,2]:
                    continue
                else:
                    if d == 1:
                        t = 1
                        w += -eta * array([1, train_data[i,0], train_data[i,1]]) * t
                    else:
                        t = -1
                        w += -eta * array([1, train_data[i,0], train_data[i,1]]) * t
        return w
    
    w = perceptron(train_data, k)

    #Plot
    plt.scatter(x_red,y_red,color='r',marker='x')
    plt.scatter(x_blue,y_blue,color='b',marker='x')
    x_line = linspace(-1.0, 1.0, 1000)
    y_line = -w[1]/w[2] * x_line - w[0]/w[2]
    plt.plot(x_line, y_line, 'k')
    xlim(-1.0, 1.0)
    ylim(-1.0, 1.0)
    title("Figure 7.2")
    show()

if __name__ == "__main__":
    #Generate sumple data
    #Mean
    mu_blue = [-0.5,0.3]
    mu_red = [0.5,-0.3]
    #Covariance
    cov = [[0.08,0.06], [0.06,0.08]]

    #Data Blue
    x_blue = [-0.85, -0.8, -0.5, -0.4, -0.1]
    y_blue = [0.6,0.95, 0.62, 0.7, -0.1]
    blue_label = np.ones(5)

    #Data Red
    x_red = [0.2, 0.4, 0.45, 0.6, 0.95]
    y_red = [0.7, -0.5, 0.4, -0.9, 0.97]
    red_label = np.ones(5)*2

    Data_blue = vstack((x_blue, y_blue, blue_label)).T
    Data_red = vstack((x_red, y_red, red_label)).T
    Data_marge = vstack((Data_blue,Data_red))
    
    N = len(Data_marge)
    for k in (N-9, N-6, N-3, N):
        func(Data_marge, k)

結果

f:id:tkzs:20150925120512p:plain

f:id:tkzs:20150925120523p:plain

f:id:tkzs:20150925120536p:plain

f:id:tkzs:20150925120548p:plain

2015-09-27

Pythonで実装PRML 第4章最小二乗法とロジスティック回帰による識別

PRML Python 機械学習

パターン認識と機械学習、第4章「線形識別モデル」に入って図4.4、図4.5の再現を行います。

まずは4.1.3の最小二乗法による識別ですが、ノイズが混ざると上手く分類できない点、多分類になると上手く機能しない点が図の4.4、4.5で説明されてます。
また、4.3.2で説明されるロジスティック回帰であればその両方の面で機能することが同様に図の4.4、4.5で表されてます。

境界面の求め方について詳しい解説は、PRMLやはじパタにお任せするとして、実装にあたっての流れだけ簡単に示します。

まずは図4.4から

実装の大まかな流れ

①求めたい識別境界はクラスkごとにそれぞれ(4.13)

$y_k({\bf x}) = {\bf x}_k^{\bf T} {\bf x}+ w_{k0}(4.13)$

②(4.15)の誤差を最小にするパラメータ $w$ を(4.16)を解くことによって求める。
$E_D({\bf \tilde{W}}) = \frac{1}{2}{\bf T}_r\{({\bf \tilde{X}}{\bf \tilde{W}}-{\bf T})^T ({\bf \tilde{X}}{\bf \tilde{W}}-{\bf T})\}(4.15)$

${\bf \tilde{W}} = \{({\bf \tilde{X}}^T {\bf \tilde{X}})^{-1} {\bf \tilde{X}}^T {\bf T}) = {\bf \tilde{X}}^\ddagger {\bf T}(4.16)$

コード

import matplotlib.pyplot as plt
from pylab import *
import numpy as np
import random
%matplotlib inline

#Function to seek hyperplane
def func(x, y,x_line):
    #(4.16）Apply least square
    W = np.linalg.inv(x.T.dot(x)).dot(x.T).dot(y)
    
    #Add bias term
    w_0 = [1.0,1.0]
    W_add = np.c_[w_0, W]
    
    #(4.18),(4.19)
    a = - ((W_add[0,1]-W_add[1,1])/(W_add[0,2]-W_add[1,2]))
    b = - (W_add[0,0]-W_add[1,0])/(W_add[0,2]-W_add[1,2])
    #(4.13)
    return a * x_line + b
if __name__ == "__main__":
    #Generate sumple data
    #Mean
    mu_red = [-1,1]
    mu_blue = [1,-1]
    #Covariance
    cov = [[1.2,1], [1,1.2]]

    x_red,y_red = np.random.multivariate_normal(mu_red,cov,50).T
    x_blue,y_blue = np.random.multivariate_normal(mu_blue,cov,50).T

    x = vstack((x_red, x_blue)).T
    y = vstack((y_red, y_blue)).T

    #Add outlier
    mu_b_out= [7,-6]
    cov2 = [[1,0], [0,1]]
    x_blue2,y_blue2 = np.random.multivariate_normal(mu_blue,cov,45).T
    x_b_out,y_b_out = np.random.multivariate_normal(mu_b_out,cov2,5).T
    x_blue3, y_blue3 = np.r_[x_blue2,x_b_out], np.r_[y_blue2,y_b_out]

    x2 = vstack((x_red, x_blue3)).T
    y2 = vstack((y_red, y_blue3)).T


    #Plot right
    plt.subplot(1, 2, 1)
    x_line = np.linspace(-4, 8, 1000)
    plot(x_line, func(x,y,x_line),'m-')
    plt.scatter(x_red,y_red,color='r',marker='x')
    plt.scatter(x_blue,y_blue,color='b',marker='x')
    xlim(-4.0, 9.0)
    ylim(-9.0, 4.0)
    title("Figure4.4 left")


    #Plot left
    plt.subplot(1, 2, 2)
    x_line = np.linspace(-4, 8, 1000)
    plot(x_line, func(x2,y2,x_line), 'm-')
    plt.scatter(x_red,y_red,color='r',marker='x')
    plt.scatter(x_blue3,y_blue3,color='b',marker='x')
    xlim(-4.0, 9.0)
    ylim(-9.0, 4.0)
    title("Figure4.4 right")

結果

境界面から遠い部分に加えられたデータの影響で識別しきれてないのがわかりますね。
f:id:tkzs:20150925053347p:plain

次に図4.5

多クラス問題でも最小二乗法による識別が機能しないことが表され、その対応策としてロジスティック回帰による分類が挙げられてます。

実装の大まかな流れ

①クラスk毎の求めたい識別境界はそれぞれ同様に(4.13)。

②最小二乗法による多クラス分類においてパラメータ $w$ の求め方も上記の流れとほぼ同様です。

③ロジスティック回帰による分類では、尤度関数(4.107)を最大化するために、誤差関数(4.108)を最大化する{\bf {W}}を求めたいです。しかし、式の中にソフトマックス関数が入っているため解析的に求められないので、IRLS(反復重み付き最小二乗法）で求めることになります。

$p({\bf T}|{\bf w}_1,...{\bf w}_k) = \prod_{n=1}^N \prod_{k=1}^K y_{nk}^{t_{nk}}(4.107)$

$E({\bf w}_1,...{\bf w}_k) = -ln p({\bf T}|{\bf w}_1,...{\bf w}_k) = -\sum_{n=1}^N \sum_{k=1}^K t_{nk} ln y_{nk}(4.108)$

④IRLSを実行するにあたり、各クラス間ごとのヘッセ行列を(4.110)で求めます。

$\nabla _{{\bf w}_k} \nabla _{{\bf w}_j} E({\bf w}_1,...{\bf w}_k) = \sum_{n=1}^N y_{nk} ({\bf I}_{kj} - y_{nk} ) \phi_n \phi_n^{\bf T}(4.16)$

⑤求めたヘッセ行列と(4.99)と(4.100)を使って ${\bf w}$ を更新します。
${\bf w}^{\rm (new)} = {\bf w}^{\rm (old)} -({\bf \Phi}^{\bf T}{\bf R}{\bf \Phi})^{-1} {\bf \Phi}^{\bf T} ({\bf y} - {\bf t}) = ({\bf \Phi}^{\bf T}{\bf R}{\bf \Phi})^{-1}{\bf \Phi}^{\bf T}{\bf R}{\bf z} (4.99)$

${\bf z} = {\bf \Phi}{\bf w}^{\rm (old)} - {\bf R}^{-1}({\bf y} - {\bf t}) (4.100)$

コード最小二乗法による分類

#Function to seek hyperplane between class1 and 2
def func12(x, y,x_line):
    #Apply least square
    W = np.linalg.inv(x.T.dot(x)).dot(x.T).dot(y)
    
    #Add bias term
    w_0 = [1.0,1.0,1.0]
    W_add = np.c_[w_0, W]
    
    #(4.18),(4.19)
    a = - ((W_add[0,1]-W_add[1,1])/(W_add[0,2]-W_add[1,2]))
    b = - (W_add[0,0]-W_add[1,0])/(W_add[0,2]-W_add[1,2])
    return a * x_line + b

#Function to seek hyperplane between class 2 and 3
def func23(x, y,x_line):
    #Apply least square
    W = np.linalg.inv(x.T.dot(x)).dot(x.T).dot(y)
    
    #Add bias term
    w_0 = [1.0,1.0,1.0]
    W_add = np.c_[w_0, W]
    
    #(4.18),(4.19)
    a = - ((W_add[1,1]-W_add[2,1])/(W_add[1,2]-W_add[2,2]))
    b = - (W_add[1,0]-W_add[2,0])/(W_add[1,2]-W_add[2,2])
    return a * x_line + b

if __name__ == "__main__":
    #Generate sumple data
    N = 45
    #Mean
    mu_red_multi = [-2,2]
    mu_green_multi = [0,0]
    mu_blue_multi = [2,-2]
    #Covariance
    cov = [[1.2,1], [1,1.2]]

    x_red_multi,y_red_multi = np.random.multivariate_normal(mu_red_multi,cov,N).T
    x_green_multi,y_green_multi = np.random.multivariate_normal(mu_green_multi,cov,N).T
    x_blue_multi,y_blue_multi = np.random.multivariate_normal(mu_blue_multi,cov,N).T

    x_multi = vstack((x_red_multi, x_green_multi, x_blue_multi)).T
    y_multi = vstack((y_red_multi, x_green_multi, y_blue_multi)).T
    
    #Label
    T = []
    for i in range(N):
        T.append(array([1, 0, 0]))
    for i in range(N):
        T.append(array([0, 1, 0]))
    for i in range(N):
        T.append(array([0, 0, 1]))
    T = array(T)

    #Plot
    x_line2 = np.linspace(-6, 6, 1000)
    plot(x_line2, func12(x_multi,y_multi,x_line2), 'm-')
    plot(x_line2, func23(x_multi,y_multi,x_line2),'m-')
    plt.scatter(x_red_multi,y_red_multi,color='r',marker='x')
    plt.scatter(x_green_multi,y_green_multi,color='g',marker="x")
    plt.scatter(x_blue_multi,y_blue_multi,color='b',marker="o")
    xlim(-6.0, 6.0)
    ylim(-6.0, 6.0)
    title("Figure 4.5 left")

結果

f:id:tkzs:20150925103002p:plain

コードロジスティック回帰による分類

#Function to seek hyperplane between class1 and 2
def func12_L(x, y, x_line):
    W = IRLS(x)
    
    #Add bias term
    w_0 = [1.0,1.0,1.0]
    W_add = np.c_[w_0, W]
    
    #(4.18),(4.19)
    a = - ((W_add[0,1]-W_add[1,1])/(W_add[0,2]-W_add[1,2]))
    b = - (W_add[0,0]-W_add[1,0])/(W_add[0,2]-W_add[1,2])
    return a * x_line + b

#Function to seek hyperplane between class 2 and 3
def func23_L(x, y, x_line):
    W = IRLS(x)
    
    #Add bias term
    w_0 = [1.0,1.0,1.0]
    W_add = np.c_[w_0, W]
    
    #(4.18),(4.19)
    a = - ((W_add[1,1]-W_add[2,1])/(W_add[1,2]-W_add[2,2]))
    b = - (W_add[1,0]-W_add[2,0])/(W_add[1,2]-W_add[2,2])
    return a * x_line + b


def IRLS(x):
    #Initial parameters of Ws
    M = 3    
    K = 3    
    W = np.zeros((M, K))
    while True:
        PHI = x
        #(4.104)
        Y = np.zeros((N, K))
        for n in range(N):
            sum = 0.0
            for k in range(K):
                sum += np.exp(np.dot(W[:,k], PHI[n,:]))
            for k in range(K):
                Y[n, k] = np.exp(np.dot(W[:,k], PHI[n,:])) / sum
    
        #(4.110) Hessian Matrix
        I = np.identity(K)
        H = np.zeros((K*K, M, M))
        for j in range(K):
            for k in range(K):
                for n in range(N):
                    sum = Y[n, k] * (I[k, j] - Y[n,j])
                    H[k+j*K] += sum * matrix(PHI)[n].reshape(M,1) * matrix(PHI)[n].reshape(1,M)
        
        #(4.92)`
        W_new = np.zeros((M, K))
        for i in range(K):
            temp = np.dot(PHI.T, Y[:,i] - T[:,i])
            W_new[:,i] = W[:,i] - np.dot(np.linalg.inv(H[i+i*K]), temp)
    
　　　　 #The condition to stop update "w"
        diff = np.linalg.norm(W_new - W) / np.linalg.norm(W)
        if diff < 0.1: break
        
    W = W_new
    return W
if __name__ == "__main__":  
    #Generate sumple data
    N = 45
    #Mean
    mu_red_multi = [-2,2]
    mu_green_multi = [0,0]
    mu_blue_multi = [2,-2]
    #Covariance
    cov = [[1.2,1], [1,1.2]]

    x_red_multi,y_red_multi = np.random.multivariate_normal(mu_red_multi,cov,N).T
    x_green_multi,y_green_multi = np.random.multivariate_normal(mu_green_multi,cov,N).T
    x_blue_multi,y_blue_multi = np.random.multivariate_normal(mu_blue_multi,cov,N).T

    x_multi = hstack((x_red_multi, x_green_multi, x_blue_multi)).T
    y_multi = hstack((y_red_multi, x_green_multi, y_blue_multi)).T
    
    print x_multi
    
    # Label
    T = []
    for i in range(N):
        T.append(array([1, 0, 0]))
    for i in range(N):
        T.append(array([0, 1, 0]))
    for i in range(N):
        T.append(array([0, 0, 1]))
    T = array(T)

    #Plot
    x_line2 = np.linspace(-6, 6, 1000)
    plot(x_line2, func12_L(x_multi,y_multi,x_line2), 'm-')
    plot(x_line2, func23_L(x_multi,y_multi,x_line2),'m-')
    plt.scatter(x_red_multi,y_red_multi,color='r',marker='x')
    plt.scatter(x_green_multi,y_green_multi,color='g',marker="x")
    plt.scatter(x_blue_multi,y_blue_multi,color='b',marker="o")
    xlim(-6.0, 6.0)
    ylim(-6.0, 6.0)
    title("Figure 4.5 right")

結果

きちんと分類できました。
f:id:tkzs:20150925103303p:plain

2015-09-26

Pythonで実装 PRML 第3章ベイズ線形回帰

PRML Python 機械学習

パターン認識と機械学習、第3章からベイズ線形回帰が予測分布を収束させていく様子を再現します。
やっていることは、1章でやったベイズ推定とほぼ一緒で、基底関数がガウス関数(3.4)になっているだけです。
ですので、コードもほぼほぼ流用してます。

1章と比較すると、3章では数式の表現がベクトルを扱うように書きなおされてますが、この辺何か意図あるのでしょうか。
(3.16)の特徴ベクトルの概念をここで紹介したからこの書き方なのかと思ってますが、どうなんでしょう。

実装の大まかな流れ

①求めたいのは予測分布(3.57)。
$p(t|{\bf x}, {\bf t}, \alpha, \beta) = N (t| {\bf m}^T_N \phi({\bf x}), \sigma^2_N ({\bf x})) (3.58)$

②基底関数は(3.4)の $\phi_i(x) = \exp\{- \frac{(x -\mu_j)^2}{2s^2}\}$ 。
基底関数は9個と指定されているがで、配置の仕方に指示がないので均等に分布するように配置。

③予測分布の平均、分散を求める式は下記の3つなので、これをそれぞれ実装して予測分布を求める。
${\bf m}_N = \beta{\bf S}_N\Phi(x_n)^{\bf T}{\bf t}(3.53)$

$\sigma^2_N(x) = \beta^{-1} + \phi({\bf x})^{\bf T} {\bf S}_N \phi({\bf x}). (3.59)$

${\bf S}^{-1}_N = \alpha {\bf I} + \beta\Phi^{\bf T}\Phi(3.54)$

コード

図を描くにあたって、サンプルデータが増えるに従って収束する様子を@naoya_tさんの記事にあるもの以上に美しく描くアイデアを出す自信が無かったので、ほぼ写経な感じで参考にさせて頂きました。ありがとうございました。

qiita.com

import numpy as np
from numpy.linalg import inv
import pandas as pd
from pylab import *
import matplotlib.pyplot as plt
%matplotlib inline

def func(x_train, y_train):
    # (3.4) Gaussian basis function
    def phi(s, mu_i, M, x):
        return np.array([exp(-(x - mu)**2 / (2 * s**2)) for mu in mu_i]).reshape((M, 1))

    #(3.53)' ((1.70)) Mean of predictive distribution
    def m(x, x_train, y_train, S):
        sum = np.array(zeros((M, 1)))
        for n in xrange(len(x_train)):
            sum += np.dot(phi(s, mu_i, M, x_train[n]), y_train[n])
        return Beta * phi(s, mu_i, M, x).T.dot(S).dot(sum)
    
    #(3.59)'((1.71)) Variance of predictive distribution   
    def s2(x, S):
        return 1.0/Beta + phi(s, mu_i, M, x).T.dot(S).dot(phi(s, mu_i, M, x))

    #(3.53)' ((1.72))
    def S(x_train, y_train):
        I = np.identity(M)
        Sigma = np.zeros((M, M))
        for n in range(len(x_train)):
            Sigma += np.dot(phi(s, mu_i, M, x_train[n]), phi(s, mu_i, M, x_train[n]).T)
        S_inv = alpha*I + Beta*Sigma
        S = inv(S_inv)
        return S
    
    #params for prior probability
    alpha = 0.1
    Beta = 9
    s = 0.1

    #use 9 gaussian basis functions
    M = 9

    # Assign basis functions
    mu_i = np.linspace(0, 1, M)
    
    S = S(x_train, y_train)

    #Sine curve
    x_real = np.arange(0, 1, 0.01)
    y_real = np.sin(2*np.pi*x_real)
    
    #Seek predictive distribution corespponding to entire x
    mean = [m(x, x_train, y_train, S)[0,0] for x in x_real]
    variance = [s2(x, S)[0,0] for x in x_real]
    SD = np.sqrt(variance)
    upper = mean + SD
    lower = mean - SD
    
    plot(x_train, y_train, 'bo')
    plot(x_real, y_real, 'g-')
    plot(x_real, mean, 'r-')
    fill_between(x_real, upper, lower, color='pink')
    xlim(0.0, 1.0)
    ylim(-2, 2)
    title("Figure 3.8")
    show()

if __name__ == "__main__":
    # Maximum observed data points (underlining function plus some noise)
    x_train = np.linspace(0, 1, 26)

    #Set "small level of random noise having a Gaussian distribution"
    loc = 0
    scale = 0.3
    y_train =  np.sin(2*np.pi*x_train) + np.random.normal(loc,scale,26)


    #Sample data pick up
    def randidx(n, k):
        r = range(n)
        shuffle(r)
        return sort(r[0:k])

    for k in (1, 2, 5, 26):
        indices = randidx(size(x_train), k)
        func(x_train[indices], y_train[indices])

結果

f:id:tkzs:20150921082338p:plain

f:id:tkzs:20150921082350p:plain

f:id:tkzs:20150921082402p:plain

f:id:tkzs:20150921082405p:plain

参考

Python - ベイズ線形回帰（PRML§3.3）の図版再現 - Qiita

2015-09-26

Pythonで実装 PRML 第2章確率分布ノンパラメトリック手法

PRML Python 機械学習

パターン認識と機械学習、第2章からは「2.5.2 ノンパラメトリック手法」を実装します。

取り出してきたデータの数を数え上げて、確率分布を把握する手法が紹介されてます。
この章はあんまり時間かけてもしょうが無いなと感じたので、さくっと実装。
カーネル密度法もスクラッチせず、scipyのパッケージ使ってズルしました。

ヒストグラム・コード

import numpy as np
import matplotlib.pyplot as plt
from pylab import *
from scipy import stats
from scipy.stats.kde import gaussian_kde
import random

def mix_G(x):
    return (0.4 * G1 + 0.6 * G2)

def mix_G_distribution(n):
  ratio = 0.3
  if random.random() <ratio:
    return random.gauss(M1, S1)
  else:
    return random.gauss(M2, S2)

if __name__ == "__main__":
    x = np.linspace(0, 1, 100)

    # Set normal distribution1 
    M1 = 0.3
    S1 = 0.15
    G1 = stats.norm.pdf(x, M1, S1)

    # Set normal distribution1 
    M2 = 0.75
    S2 = 0.1
    G2 = stats.norm.pdf(x, M2, S2)

    N = 50
    Data = [mix_G_distribution(n) for n in range(N)]


    plt.subplot(3, 1, 1)
    plt.hist(Data, bins=1/0.04, normed=True)
    plt.plot(x, mix_G(x), "g-")
    plt.xlim(0, 1)
    plt.ylim(0, 3)
    title("Figure 2.24")

    plt.subplot(3, 1, 2)
    plt.hist(Data, bins=1/0.08, normed=True)
    plt.plot(x, mix_G(x), "g-")
    plt.xlim(0, 1)
    plt.ylim(0, 3)

    plt.subplot(3, 1, 3)
    plt.hist(Data, bins=1/0.25, normed=True)
    plt.plot(x, mix_G(x), "g-")
    plt.xlim(0, 1)
    plt.ylim(0, 3)

結果

f:id:tkzs:20150918181214p:plain

カーネル密度法・コード

if __name__ == "__main__":
    #Karnel density estimation
    from scipy.stats.kde import gaussian_kde

    plt.subplot(3, 1, 1)
    plt.plot(x, gaussian_kde(Data, 0.005)(x))
    plt.plot(x, mix_G(x), "g-")
    plt.xlim(0, 1)
    plt.ylim(0, 3)
    title("Figure 2.25")

    plt.subplot(3, 1, 2)
    plt.plot(x, gaussian_kde(Data, 0.07)(x))
    plt.plot(x, mix_G(x), "g-")
    plt.xlim(0, 1)
    plt.ylim(0, 3)

    plt.subplot(3, 1, 3)
    plt.plot(x, gaussian_kde(Data, 0.2)(x))
    plt.plot(x, mix_G(x), "g-")
    plt.xlim(0, 1)
    plt.ylim(0, 3)

結果

f:id:tkzs:20150918181240p:plain

K近傍法・コード

#k_Neighbourhood
def k_NN(test, train, k):
    train = np.array(train)
    train.sort()
    r = []
    for i in test:
        distance = abs(train - i)
        distance.sort()
        r.append(distance[(k-1)])
    r = np.array(r)
    return k / (2* r * N)



if __name__ == "__main__":
    title("Figure 2.26")
    plt.subplot(3, 1, 1)
    plt.plot(x, k_NN(x, Data, 1))
    plt.plot(x, mix_G(x), "g-")
    plt.xlim(0, 1)
    plt.ylim(0, 3)

    plt.subplot(3, 1, 2)
    plt.plot(x, k_NN(x, Data, 10))
    plt.plot(x, mix_G(x), "g-")
    plt.xlim(0, 1)
    plt.ylim(0, 3)

    plt.subplot(3, 1, 3)
    plt.plot(x, k_NN(x, Data, 30))
    plt.plot(x, mix_G(x), "g-")
    plt.xlim(0, 1)
    plt.ylim(0, 3)

結果

f:id:tkzs:20150918181505p:plain

2015-09-25

Pythonで実装 PRML 第1章ベイズ推定

Python 機械学習 PRML

前回に続いて、PRML1章から「1.2.6 ベイズ推定」の実装です。
図1.17の再現ですが、M=9とした上で、予測分布の平均と±1σの範囲を示しています。

数式追ってるだけだと抽象的で、いまいちピンと来なかったんですが、
実装することで、(1.70)、(1.71)が分布を示すことが具体的に可視化されて腹に落ちました。
この予測精度が、前回の多項式曲線フィッティングとどう異なるのかは今回の例からは判断できませんが、
それぞれどのようなものかを理解する上では大変役立ちました。

実装の大まかな流れ

①求めたいのは予測分布(1.69)。
$p(t| x, {\bf x}, {\bf t}) = N(t| m(x), s^2(x)) (1.69)$

②今回、基底関数は $\phi_i(x) = x^i$ for $i = 0,...M$

③予測分布の平均、分散を求める式は下記の3つなので、これをそれぞれ実装して、予測範囲 [tex:0.0

コード

import numpy as np
from numpy.linalg import inv
import pandas as pd
from pylab import *
import matplotlib.pyplot as plt

#From p31 the auther define phi as following
def phi(x):
    return np.array([x**i for i in xrange(M+1)]).reshape((M+1, 1))

#(1.70) Mean of predictive distribution
def m(x, x_train, y_train, S):
    sum = np.array(zeros((M+1, 1)))
    for n in xrange(len(x_train)):
        sum += np.dot(phi(x_train[n]), y_train[n])
    return Beta * phi(x).T.dot(S).dot(sum)

    
#(1.71) Variance of predictive distribution   
def s2(x, S):
    return 1.0/Beta + phi(x).T.dot(S).dot(phi(x))

#(1.72)
def S(x_train, y_train):
    I = np.identity(M+1)
    Sigma = np.zeros((M+1, M+1))
    for n in range(len(x_train)):
        Sigma += np.dot(phi(x_train[n]), phi(x_train[n]).T)
    S_inv = alpha*I + Beta*Sigma
    S = inv(S_inv)
    return S

if __name__ == "__main__":
    alpha = 0.005
    Beta = 11.1
    M = 9
    
    #Sine curve
    x_real = np.arange(0, 1, 0.01)
    y_real = np.sin(2*np.pi*x_real)
    
    ##Training Data
    N=10
    x_train = np.linspace(0, 1, 10)
    
    #Set "small level of random noise having a Gaussian distribution"
    loc = 0
    scale = 0.3
    y_train =  np.sin(2*np.pi*x_train) + np.random.normal(loc,scale,N)
    
    
    S = S(x_train, y_train)
    
    #Seek predictive distribution corespponding to entire x
    mean = [m(x, x_train, y_train, S)[0,0] for x in x_real]
    variance = [s2(x, S)[0,0] for x in x_real]
    SD = np.sqrt(variance)
    upper = mean + SD
    lower = mean - SD
    
    
    plot(x_train, y_train, 'bo')
    plot(x_real, y_real, 'g-')
    plot(x_real, mean, 'r-')
    fill_between(x_real, upper, lower, color='pink')
    xlim(0.0, 1.0)
    ylim(-2, 2)
    title("Figure 1.17")

結果

f:id:tkzs:20150918151632p:plain

2015-09-18

Pythonで実装 PRML 第1章多項式曲線フィッティング

PRML Python 機械学習

パターン認識と機械学習、第1章から「多項式多項式曲線フィッティング」の図の再現に取り組んでみます。
早速図1.4の再現から行ってみたいと思います。また、データの数を増やすことで、予測の正確性が増すことの確認を図1.6で行っています。

数学的にも実装上もまだまだそんなに難しくないので、淡々と実装してみました。
とりあえず、最初の第一歩。

実装の大まかな流れ

①(1.1)を実装。
$y(x, {\bf w}) = w_0 + w_1 x + w_1 x^2 + ...+ w_M x^M = \sum_{j=1}^M w_j x^M (1.1)$

②(1.122)と(1.123)を満たす値を求めると二乗誤差関数(1.2)を最小化するパラメータ ${\bf w}$ の値が求まる。

$E({\bf w}) = \frac{1}{2} \sum_{n=1}^N \{{y({x_n, \bf x_n}) -t_n}^2 \}(1.2)$

$\sum_{j=0}^M {\bf A}_{ij} w_j = {\bf T}_i(1.122)$

${\bf A}_i = \sum_{n=1}^N (x_n)^{i+j} (1.123)$

${\bf T}_i = \sum_{n=1}^N (x_n)^i t_n (1.123)$

コード

import numpy as np
import pandas as pd
from pylab import *
import matplotlib.pyplot as plt
#(1.1)
def y(x, W, M):
    Y = np.array([W[i] * (x ** i) for i in xrange(M+1)])
    return Y.sum()

#(1.2),(1.122),(1.123)
def E(x, t, M):
    A =np.zeros((M+1, M+1))
    for i in range(M+1):
        for j in range(M+1):
            A[i,j] = (x**(i+j)).sum()
        
    T = np.array([((x**i)*t).sum() for i in xrange(M+1)])
    return  np.linalg.solve(A, T)

if __name__ == "__main__":
    #Sine curve
    x_real = np.arange(0, 1, 0.01)
    y_real = np.sin(2*np.pi*x_real)
    
    ##Training Data
    N=10
    x_train = np.arange(0, 1, 0.1)
    
    #Set "small level of random noise having a Gaussian distribution"
    loc = 0
    scale = 0.3
    y_train =  np.sin(2*np.pi*x_train) + np.random.normal(loc,scale,N)
    
    for M in [0,1,3,9]:
        W = E(x_train, y_train, M)
        print W
    
        y_estimate = [y(x, W, M) for x in x_real]
    

        plt.plot(x_real, y_estimate, 'r-')
        plt.plot(x_train, y_train, 'bo') 
        plt.plot(x_real, y_real, 'g-')
        xlim(0.0, 1.0)
        ylim(-2, 2)
        title("Figure 1.4")

結果

f:id:tkzs:20150918132529p:plain

コード

if __name__ == "__main__":
    M2 = 9

    N2 = 20
    x_train2 = np.arange(0, 1, 0.05)
    y_train2 =  np.sin(2*np.pi*x_train2) + np.random.normal(loc,scale,N2)

    N3 = 100
    x_train3 = np.arange(0,1, 0.01)
    y_train3 =  np.sin(2*np.pi*x_train3) + np.random.normal(loc,scale,N3)


    W2 = E(x_train2, y_train2, M2)
    W3 = E(x_train3, y_train3, M2)
    
    y_estimate2 = [y(x, W2, M2) for x in x_real]
    y_estimate3 = [y(x, W3, M2) for x in x_real]

    plt.subplot(1, 2, 1)
    plt.plot(x_real, y_estimate2, 'r-')
    plt.plot(x_train2, y_train2, 'bo') 
    plt.plot(x_real, y_real, 'g-')
    xlim(0.0, 1.0)
    ylim(-2, 2)
    title("Figure 1.6 left")

    plt.subplot(1, 2, 2)
    plt.plot(x_real, y_estimate3, 'r-')
    plt.plot(x_train3, y_train3, 'bo') 
    plt.plot(x_real, y_real, 'g-')
    xlim(0.0, 1.0)
    ylim(-2, 2)
    title("Figure 1.6 right")

結果

f:id:tkzs:20150918132703p:plain

2015-09-14

PuLPを使った線形計画法(Linear Programming)

Python 機械学習統計

カーネルやら何やらを理解するために、非線形計画法を勉強中。
で、その前に線形計画法で実際に手を動かしたくなったので無償で使えるPuLPを使ってみた。

今回例題として解いたのは「これならわかる最適化数学」の第6章、線形計画法。
似たような例題がいくつか並ぶので、ピックアップしてPuLPでモデル化してみた。

【例題6.2】(P161)

2種類の容器A,Bを作るのに, 機械M1,M2を使う. 容器Aを1個作るのにM1を2分, 機械M2を4分使う必要がある.一方,容器Bを1個作るのに、機械M1を8分,機械M2を4分使う必要がある. 容器A,Bを作る利益は一つあたりそれぞれ29円,45円である. 利益を最大にするにはどのように計画すれば良いか.

利益を目的関数f、容器の個数をそれぞれx,yを問いてモデル化すると式は下記のようになる。
制約条件:
2x + 8y <= 60
4x + 4y <= 60
x >= 0, y >= 0

目的関数：
f = 29x + 45y → max

で、これらの式をそのまんまPuLPに突っ込んで解かせるコードが下記。
今回は容器の個数を求めるので、解は整数であり、変数はIntegerで指定する。

#Creat variables
x = pulp.LpVariable("x", cat = "Integer")
y = pulp.LpVariable("y", cat = "Integer")

#Create object function
problem = pulp.LpProblem("Container", pulp.LpMaximize)
problem += 29*x + 45*y  #(6.6)

#constrained condition
problem += 2*x + 8*y <= 60 #(6.4)
problem += 4*x + 4*y <= 60 #(6.4)
problem += x >= 0 #(6.5)
problem += y >= 0 #(6.5)

status = problem.solve()
print "Status", pulp.LpStatus[status]

print problem

print "Result"
print "x", x.value()
print "y", y.value()

結果は、

Status Optimal
Container:
MAXIMIZE
29*x + 45*y + 0
SUBJECT TO
_C1: 2 x + 8 y <= 60
_C2: 4 x + 4 y <= 60
_C3: x >= 0
_C4: y >= 0

VARIABLES
x free Integer
y free Integer

Result
x 10.0
y 5.0

【例題6.3】(P161)

2種類の金属M1,M2を用いて2種類の合金A,Bを作る. 合金A,Bはそれぞれ1トン当たり30,000円、25,000円の利益がある。合金Aは金属M1,M2を1:1の割合で混合し、合金Bは金属M1,M2を1:3の割合で混合する。金属M1,M2はそれぞれ1日当たり10トン,15トンの供給が可能である. 利益を最大にするのに合金A,Bを作るにはどのように計画すればよいか.

モデル化すると、
制約条件:
0.5x + 0.25y <= 10
0.5x + 0.75y <= 15
x >= 0, y >= 0

目的関数：
f = 30x + 25y → max

今回はコード化にあたり、解は連続値であるので、変数はContinuousで指定する。

#Creat variables
x = pulp.LpVariable("x", cat = "Continuous")
y = pulp.LpVariable("y", cat = "Continuous")

#Create object function
problem = pulp.LpProblem("Alloy", pulp.LpMaximize)
problem += 30*x + 25*y  #(6.9)

#constrained condition
problem += 0.5*x + 0.25*y <= 10 #(6.7)
problem += 0.5*x + 0.75*y <= 15 #(6.7)
problem += x >= 0 #(6.8)
problem += y >= 0 #(6.8)

status = problem.solve()
print "Status", pulp.LpStatus[status]

print problem

print "Result"
print "x", x.value()
print "y", y.value()

結果は、

Status Optimal
Alloy:
MAXIMIZE
30*x + 25*y + 0
SUBJECT TO
_C1: 0.5 x + 0.25 y <= 10
_C2: 0.5 x + 0.75 y <= 15
_C3: x >= 0
_C4: y >= 0

VARIABLES
x free Continuous
y free Continuous

Result
x 15.0
y 10.0

【例題6.7】(P164)

今回の例のように目的関数が発散して、最適解が存在しない場合は下記のように、"Status Unbounded"と表示される。

制約条件:

x - y <= -1
2x + y <= 1

x - 2y <= 1
x >= 0, y >= 0

目的関数：
f = x + y → max

#Creat variables
a = pulp.LpVariable("a", cat = "Continuous")
b = pulp.LpVariable("b", cat = "Continuous")

#Create object function
problem = pulp.LpProblem("Test", pulp.LpMaximize)
problem += a + b  #(6.16)

#constrained condition
problem += -a - b <= -2 #(6.14)
problem += -2*a + b <= 15 #(6.14)
problem += a - 2*b <= 15 #(6.14)
problem += a >= 0 #(6.15)
problem += b >= 0 #(6.15)

status = problem.solve()
print "Status", pulp.LpStatus[status]

print problem

Status Unbounded

実装の大まかな流れ

コード

結果

まずは図4.4から

実装の大まかな流れ

コード

結果

次に図4.5

実装の大まかな流れ

コード 最小二乗法による分類

結果

コード ロジスティック回帰による分類

結果

実装の大まかな流れ

コード

結果

参考

ヒストグラム・コード

結果

カーネル密度法・コード

結果

K近傍法・コード

結果

実装の大まかな流れ

コード

結果

実装の大まかな流れ

コード

結果

コード

結果

【例題6.2】(P161)

【例題6.3】(P161)

【例題6.7】(P164)

コード最小二乗法による分類

コードロジスティック回帰による分類