Sunday, October 6, 2019

Gaussian Process for classification

Gaussian Process is one of a srochastic process, such that every finite collection of those random variables has a multivariate normal distribution. In this article, I will utilize Gaussian process for classification. Also I will show implementation from scratch with numpy module.

0. Model

In classification, our goal is to obtain posterior probability of target vaiable for new input vectors given a set of training data. Theses probability should lie in the interval (0,1), whereas gaussian process model's prediction lies on $\mathbb{R}$. Therefore it's required to transform outputs of gaussian process with nonlinear function. In this article, we consider binary classification. In binary classification, we can transform the outputs of gaussian process with logistic sigmoid function. Consequently, we apply Bernoulli distribution to labels.

Let output of gaussian process be $a$, targe variable be $t$, the likelihood function would be following,

$$p(t) = \sigma(a)^t (1-\sigma(a))^{(1-t)}$$

1. Sampling from likelihood function

The next thing you might wanna do is sampling from model in bayesian setting. The following is sample code of sampling with gaussian kernel.

In [1]:
import numpy as np
import matplotlib.pyplot as plt
from gp_classifier import gp_classifier
%matplotlib inline
plt.style.use('dark_background')
from scipy.stats import multivariate_normal
In [2]:
x = np.linspace(-4,4,101)
cov = gauss_kernel(x,theta1=20, theta2=1, theta3=1)
# sample from gaussian stochastic process
y = multivariate_normal.rvs(
    mean=np.zeros(cov.shape[0]),cov=cov
)
# trainsform to non-gaussian stochastic process
non_y = sigmoid(y)

plt.figure(figsize=(14,5))
# Sample from gaussian
plt.subplot(1,2,1)
# plt.scatter(x,y)
plt.plot(x, y)
plt.title('Sample from gaussian prior',fontsize=14)
plt.ylim(-10,10)
# Sample from non-gaussian
plt.subplot(1,2,2)
# plt.scatter(x,non_y)
plt.plot(x,non_y)
plt.title('Sample from non-gaussian strocastic process',
         fontsize=14)
plt.ylim(0,1)

plt.show()
In [3]:
# Required functions for gaussian process
def sigmoid(x):
    """
    Return the value mapped by sigmoid function
    """
    return 1 / (1+np.exp(-x))

def gauss_kernel(X, theta1=1, theta2=1, theta3=1, delta=1e-3):
    """
    Return matrix yielded by kernel
    """
    return np.array([
        theta1*np.exp(-(1/theta2) * ((X[i]-X[j])**2).sum()) + (theta3*delta) 
         if i==j else
             theta1*np.exp(-theta2 * ((X[i]-X[j])**2).sum())
        for i in range(X.shape[0])
        for j in range(X.shape[0])
    ]).reshape(X.shape[0],X.shape[0])

2. Predictive model

As I have said before, in classification, what we wanna know is prediction for target value given a set of training data.
$$ p(t_{N+1}|p(t_{N})) = \int p(t_{N+1},a_{N+1} | t_N) d a_{N+1}$$
For binary classification, it's sufficient to prediction the probability of $p(t_{N+1} =1)$. Therefore,
$$ p(t_{N+1} =1|p(t_{N})) = \int \sigma(a_{N+1})p(a_{N+1}| t_N) d a_{N+1}$$
Unfortunately, this integral is intractable due to the presence of logistic sigmoid funtion. In this article, Laplace approximation will be applied. Consequently, we can obtain,

$$E[a_{N+1}|t_N] = k^T(t_N - \sigma_N)$$$$Var[a_{N+1}|t_N] = c - k^T(W_N^{-1} - C_N)^{-1}k$$

where $W_N$ is a diagonal matrix with elements $\sigma(a_n)(1 − \sigma(a_n))$, $C_N$ is covariance matrix created by kernel function with data set $a_N$.

3. Classification for toy data

In this article, I'm gonna apply gaussian process classification to XOR data set. The data set looks like following. Data coloured red have label 1, data with blue colour have label 0.

In [4]:
# Prepare XOR data
X = np.random.randn(200,2)
y_xor = np.logical_xor(X[:,0]>0, X[:,1]>0)
labels = np.where(y_xor,1,0)

plt.figure(figsize=(13,8))
for label, col in zip([0,1],['blue','red']):
    mask = label == labels
    plt.scatter(X[mask,0], X[mask,1],color=col)

plt.title('XOR data',fontsize=16)
plt.xlim(-3,3)
plt.xlim(-3,3)
plt.show()

Now is the time to apply gaussian process to XOR data. The implementation of gaussian process classifier can be accessed from my github.

gp_classifier.py

In [5]:
plt.figure(figsize=(13,8))
# create class of gaussian process classifier
gclf = gp_classifier()
gclf.fit(X,labels)

x = np.linspace(-3,3,101)
y = np.linspace(-3,3,101)

xx, yy = np.meshgrid(x,y)

zz = gclf.predict_proba(np.vstack([xx.ravel(),yy.ravel()]).T)

ctf = plt.contourf(xx,yy,zz[:,1].reshape(xx.shape),cmap='coolwarm')
plt.colorbar(ctf)

# Plot toy data
for label, col in zip(np.unique(labels),['blue','red']):
    mask = np.where(labels == label)[0]
    plt.scatter(X[mask,0],X[mask,1],color=col)

plt.xlim(-3,3)
plt.ylim(-3,3)

plt.title('Classification with gaussian process',fontsize=16)
plt.show()

Countour which is coloured with red and blue is probability of $p(t_{N+1}=1)$ gaussian process predicted. You can see gaussian process captured the density of data with probability well. In next acticle, I will try another kernel functions and data sets.

4. Reference

No comments:

Post a Comment