Pythonで実装PRML 第4章最小二乗法とロジスティック回帰による識別

パターン認識と機械学習、第4章「線形識別モデル」に入って図4.4、図4.5の再現を行います。

まずは4.1.3の最小二乗法による識別ですが、ノイズが混ざると上手く分類できない点、多分類になると上手く機能しない点が図の4.4、4.5で説明されてます。
また、4.3.2で説明されるロジスティック回帰であればその両方の面で機能することが同様に図の4.4、4.5で表されてます。

境界面の求め方について詳しい解説は、PRMLやはじパタにお任せするとして、実装にあたっての流れだけ簡単に示します。

まずは図4.4から

実装の大まかな流れ

①求めたい識別境界はクラスkごとにそれぞれ(4.13)

$y_k({\bf x}) = {\bf x}_k^{\bf T} {\bf x}+ w_{k0}(4.13)$

②(4.15)の誤差を最小にするパラメータ $w$ を(4.16)を解くことによって求める。
$E_D({\bf \tilde{W}}) = \frac{1}{2}{\bf T}_r\{({\bf \tilde{X}}{\bf \tilde{W}}-{\bf T})^T ({\bf \tilde{X}}{\bf \tilde{W}}-{\bf T})\}(4.15)$

${\bf \tilde{W}} = \{({\bf \tilde{X}}^T {\bf \tilde{X}})^{-1} {\bf \tilde{X}}^T {\bf T}) = {\bf \tilde{X}}^\ddagger {\bf T}(4.16)$

コード

import matplotlib.pyplot as plt
from pylab import *
import numpy as np
import random
%matplotlib inline

#Function to seek hyperplane
def func(x, y,x_line):
    #(4.16）Apply least square
    W = np.linalg.inv(x.T.dot(x)).dot(x.T).dot(y)
    
    #Add bias term
    w_0 = [1.0,1.0]
    W_add = np.c_[w_0, W]
    
    #(4.18),(4.19)
    a = - ((W_add[0,1]-W_add[1,1])/(W_add[0,2]-W_add[1,2]))
    b = - (W_add[0,0]-W_add[1,0])/(W_add[0,2]-W_add[1,2])
    #(4.13)
    return a * x_line + b
if __name__ == "__main__":
    #Generate sumple data
    #Mean
    mu_red = [-1,1]
    mu_blue = [1,-1]
    #Covariance
    cov = [[1.2,1], [1,1.2]]

    x_red,y_red = np.random.multivariate_normal(mu_red,cov,50).T
    x_blue,y_blue = np.random.multivariate_normal(mu_blue,cov,50).T

    x = vstack((x_red, x_blue)).T
    y = vstack((y_red, y_blue)).T

    #Add outlier
    mu_b_out= [7,-6]
    cov2 = [[1,0], [0,1]]
    x_blue2,y_blue2 = np.random.multivariate_normal(mu_blue,cov,45).T
    x_b_out,y_b_out = np.random.multivariate_normal(mu_b_out,cov2,5).T
    x_blue3, y_blue3 = np.r_[x_blue2,x_b_out], np.r_[y_blue2,y_b_out]

    x2 = vstack((x_red, x_blue3)).T
    y2 = vstack((y_red, y_blue3)).T


    #Plot right
    plt.subplot(1, 2, 1)
    x_line = np.linspace(-4, 8, 1000)
    plot(x_line, func(x,y,x_line),'m-')
    plt.scatter(x_red,y_red,color='r',marker='x')
    plt.scatter(x_blue,y_blue,color='b',marker='x')
    xlim(-4.0, 9.0)
    ylim(-9.0, 4.0)
    title("Figure4.4 left")


    #Plot left
    plt.subplot(1, 2, 2)
    x_line = np.linspace(-4, 8, 1000)
    plot(x_line, func(x2,y2,x_line), 'm-')
    plt.scatter(x_red,y_red,color='r',marker='x')
    plt.scatter(x_blue3,y_blue3,color='b',marker='x')
    xlim(-4.0, 9.0)
    ylim(-9.0, 4.0)
    title("Figure4.4 right")

結果

境界面から遠い部分に加えられたデータの影響で識別しきれてないのがわかりますね。
f:id:tkzs:20150925053347p:plain

次に図4.5

多クラス問題でも最小二乗法による識別が機能しないことが表され、その対応策としてロジスティック回帰による分類が挙げられてます。

実装の大まかな流れ

①クラスk毎の求めたい識別境界はそれぞれ同様に(4.13)。

②最小二乗法による多クラス分類においてパラメータ $w$ の求め方も上記の流れとほぼ同様です。

③ロジスティック回帰による分類では、尤度関数(4.107)を最大化するために、誤差関数(4.108)を最大化する{\bf {W}}を求めたいです。しかし、式の中にソフトマックス関数が入っているため解析的に求められないので、IRLS(反復重み付き最小二乗法）で求めることになります。

$p({\bf T}|{\bf w}_1,...{\bf w}_k) = \prod_{n=1}^N \prod_{k=1}^K y_{nk}^{t_{nk}}(4.107)$

$E({\bf w}_1,...{\bf w}_k) = -ln p({\bf T}|{\bf w}_1,...{\bf w}_k) = -\sum_{n=1}^N \sum_{k=1}^K t_{nk} ln y_{nk}(4.108)$

④IRLSを実行するにあたり、各クラス間ごとのヘッセ行列を(4.110)で求めます。

$\nabla _{{\bf w}_k} \nabla _{{\bf w}_j} E({\bf w}_1,...{\bf w}_k) = \sum_{n=1}^N y_{nk} ({\bf I}_{kj} - y_{nk} ) \phi_n \phi_n^{\bf T}(4.16)$

⑤求めたヘッセ行列と(4.99)と(4.100)を使って ${\bf w}$ を更新します。
${\bf w}^{\rm (new)} = {\bf w}^{\rm (old)} -({\bf \Phi}^{\bf T}{\bf R}{\bf \Phi})^{-1} {\bf \Phi}^{\bf T} ({\bf y} - {\bf t}) = ({\bf \Phi}^{\bf T}{\bf R}{\bf \Phi})^{-1}{\bf \Phi}^{\bf T}{\bf R}{\bf z} (4.99)$

${\bf z} = {\bf \Phi}{\bf w}^{\rm (old)} - {\bf R}^{-1}({\bf y} - {\bf t}) (4.100)$

コード最小二乗法による分類

#Function to seek hyperplane between class1 and 2
def func12(x, y,x_line):
    #Apply least square
    W = np.linalg.inv(x.T.dot(x)).dot(x.T).dot(y)
    
    #Add bias term
    w_0 = [1.0,1.0,1.0]
    W_add = np.c_[w_0, W]
    
    #(4.18),(4.19)
    a = - ((W_add[0,1]-W_add[1,1])/(W_add[0,2]-W_add[1,2]))
    b = - (W_add[0,0]-W_add[1,0])/(W_add[0,2]-W_add[1,2])
    return a * x_line + b

#Function to seek hyperplane between class 2 and 3
def func23(x, y,x_line):
    #Apply least square
    W = np.linalg.inv(x.T.dot(x)).dot(x.T).dot(y)
    
    #Add bias term
    w_0 = [1.0,1.0,1.0]
    W_add = np.c_[w_0, W]
    
    #(4.18),(4.19)
    a = - ((W_add[1,1]-W_add[2,1])/(W_add[1,2]-W_add[2,2]))
    b = - (W_add[1,0]-W_add[2,0])/(W_add[1,2]-W_add[2,2])
    return a * x_line + b

if __name__ == "__main__":
    #Generate sumple data
    N = 45
    #Mean
    mu_red_multi = [-2,2]
    mu_green_multi = [0,0]
    mu_blue_multi = [2,-2]
    #Covariance
    cov = [[1.2,1], [1,1.2]]

    x_red_multi,y_red_multi = np.random.multivariate_normal(mu_red_multi,cov,N).T
    x_green_multi,y_green_multi = np.random.multivariate_normal(mu_green_multi,cov,N).T
    x_blue_multi,y_blue_multi = np.random.multivariate_normal(mu_blue_multi,cov,N).T

    x_multi = vstack((x_red_multi, x_green_multi, x_blue_multi)).T
    y_multi = vstack((y_red_multi, x_green_multi, y_blue_multi)).T
    
    #Label
    T = []
    for i in range(N):
        T.append(array([1, 0, 0]))
    for i in range(N):
        T.append(array([0, 1, 0]))
    for i in range(N):
        T.append(array([0, 0, 1]))
    T = array(T)

    #Plot
    x_line2 = np.linspace(-6, 6, 1000)
    plot(x_line2, func12(x_multi,y_multi,x_line2), 'm-')
    plot(x_line2, func23(x_multi,y_multi,x_line2),'m-')
    plt.scatter(x_red_multi,y_red_multi,color='r',marker='x')
    plt.scatter(x_green_multi,y_green_multi,color='g',marker="x")
    plt.scatter(x_blue_multi,y_blue_multi,color='b',marker="o")
    xlim(-6.0, 6.0)
    ylim(-6.0, 6.0)
    title("Figure 4.5 left")

結果

f:id:tkzs:20150925103002p:plain

コードロジスティック回帰による分類

#Function to seek hyperplane between class1 and 2
def func12_L(x, y, x_line):
    W = IRLS(x)
    
    #Add bias term
    w_0 = [1.0,1.0,1.0]
    W_add = np.c_[w_0, W]
    
    #(4.18),(4.19)
    a = - ((W_add[0,1]-W_add[1,1])/(W_add[0,2]-W_add[1,2]))
    b = - (W_add[0,0]-W_add[1,0])/(W_add[0,2]-W_add[1,2])
    return a * x_line + b

#Function to seek hyperplane between class 2 and 3
def func23_L(x, y, x_line):
    W = IRLS(x)
    
    #Add bias term
    w_0 = [1.0,1.0,1.0]
    W_add = np.c_[w_0, W]
    
    #(4.18),(4.19)
    a = - ((W_add[1,1]-W_add[2,1])/(W_add[1,2]-W_add[2,2]))
    b = - (W_add[1,0]-W_add[2,0])/(W_add[1,2]-W_add[2,2])
    return a * x_line + b


def IRLS(x):
    #Initial parameters of Ws
    M = 3    
    K = 3    
    W = np.zeros((M, K))
    while True:
        PHI = x
        #(4.104)
        Y = np.zeros((N, K))
        for n in range(N):
            sum = 0.0
            for k in range(K):
                sum += np.exp(np.dot(W[:,k], PHI[n,:]))
            for k in range(K):
                Y[n, k] = np.exp(np.dot(W[:,k], PHI[n,:])) / sum
    
        #(4.110) Hessian Matrix
        I = np.identity(K)
        H = np.zeros((K*K, M, M))
        for j in range(K):
            for k in range(K):
                for n in range(N):
                    sum = Y[n, k] * (I[k, j] - Y[n,j])
                    H[k+j*K] += sum * matrix(PHI)[n].reshape(M,1) * matrix(PHI)[n].reshape(1,M)
        
        #(4.92)`
        W_new = np.zeros((M, K))
        for i in range(K):
            temp = np.dot(PHI.T, Y[:,i] - T[:,i])
            W_new[:,i] = W[:,i] - np.dot(np.linalg.inv(H[i+i*K]), temp)
    
　　　　 #The condition to stop update "w"
        diff = np.linalg.norm(W_new - W) / np.linalg.norm(W)
        if diff < 0.1: break
        
    W = W_new
    return W
if __name__ == "__main__":  
    #Generate sumple data
    N = 45
    #Mean
    mu_red_multi = [-2,2]
    mu_green_multi = [0,0]
    mu_blue_multi = [2,-2]
    #Covariance
    cov = [[1.2,1], [1,1.2]]

    x_red_multi,y_red_multi = np.random.multivariate_normal(mu_red_multi,cov,N).T
    x_green_multi,y_green_multi = np.random.multivariate_normal(mu_green_multi,cov,N).T
    x_blue_multi,y_blue_multi = np.random.multivariate_normal(mu_blue_multi,cov,N).T

    x_multi = hstack((x_red_multi, x_green_multi, x_blue_multi)).T
    y_multi = hstack((y_red_multi, x_green_multi, y_blue_multi)).T
    
    print x_multi
    
    # Label
    T = []
    for i in range(N):
        T.append(array([1, 0, 0]))
    for i in range(N):
        T.append(array([0, 1, 0]))
    for i in range(N):
        T.append(array([0, 0, 1]))
    T = array(T)

    #Plot
    x_line2 = np.linspace(-6, 6, 1000)
    plot(x_line2, func12_L(x_multi,y_multi,x_line2), 'm-')
    plot(x_line2, func23_L(x_multi,y_multi,x_line2),'m-')
    plt.scatter(x_red_multi,y_red_multi,color='r',marker='x')
    plt.scatter(x_green_multi,y_green_multi,color='g',marker="x")
    plt.scatter(x_blue_multi,y_blue_multi,color='b',marker="o")
    xlim(-6.0, 6.0)
    ylim(-6.0, 6.0)
    title("Figure 4.5 right")