Gaussian Process for classification
When you work on model selection or setting hyper parameter to specific value by type 2 maximum likelihood, computing marginal likelihood is always imperative. However marginal likelihood of gaussian process classification is intractable due to the presence of non-linear function. Therefore in this article, I will show one of example of calculating marginal likelihood with Laplace approximation.
Marginal Likelihood of gaussian process classification¶
1. Why is it intractable??¶
Marginal likelihood of gaussian process classification can be expressed as following,
$$\begin{eqnarray} p(t_N) &=& \int p(t_N|a_N)p(a_N)da_N \\ &=& \int \prod_{n=1}^N \sigma(a_n)^{t_n}(1-\sigma(a_n))^{1-t_n} N(a_N)da_N \end{eqnarray} \tag{1}$$
Because of the presence of sigmoid function, this integral is intractable. Hence approximation is required.
2. Laplace approximation¶
I will introduce Laplace approximation in a nutshell.
Let's say p(x) is unknown probability distribution and it's intractable. Then consider $f(x)$ such that
where $Z = \int f(x)dx$ is normalization term. We assume $f(x)$ is easily attainable.
By applying taylor expansion to logarithm of $f(x)$, $f(x)$ can be approximated as following,
$$ f(x) \approx f(x_0)N(x|x_0,A)$$
where $x_0$ saitisfies $\frac{d}{dx}ln f(x)|_{x=x_0} = 0$. Therefore,
$$\int f(x)dx \approx f(x_0) \cdot -{(2\pi)^{\frac{d}{2}}|A|^{\frac{1}{2}}}$$
We can apply this approximation to equation (1).
3. Apply to toy data¶
At this time, gaussian process classification will be applied to following toy data.
from gp_classifier import gp_classifier
import kernel
import numpy as np
from scipy.stats import multivariate_normal,uniform
import math
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use('dark_background')
mean_list = np.array([[-2,1],[1,-1.5],[1.5,0],
[0,1.5],[0,3],[3,3],[3,-3.5],
[-3,4],[-1,-4],[-4,-4]])
for i,mean in enumerate(mean_list):
# sample number
samp_num = math.floor(uniform.rvs(loc=10,scale=15))
# covariance
cov_val = uniform.rvs(loc=-0.6,scale=0.6)
if i == 0:
# label
labels = np.full(samp_num,0)
# data
X = multivariate_normal.rvs(
mean=mean,cov=[[1.5,cov_val],[cov_val,1.5]],
size=samp_num)
else:
# label
labels = np.concatenate([labels,np.full(samp_num,i%2)])
# data
X = np.concatenate([X,multivariate_normal.rvs(
mean=mean,cov=[[0.2,cov_val],[cov_val,0.2]],
size=samp_num)])
# plot toy data
for label, col in zip([0,1],['blue','red']):
mask = label == labels
plt.scatter(X[mask,0], X[mask,1],color=col)
In order to compare values of marginal likelihood, four gaussian process classifications with differenct kernel functions are implemented here,
- linear kernel
- gaussian kernel
- periodical kernel
- linear and gaussian kernel
You can check code which was utilized in thist article via github.
gp_classifier.py
kernel.py
import gp_classifier
import importlib
importlib.reload(gp_classifier)
from gp_classifier import gp_classifier
plt.figure(figsize=(14,14))
kernel_dict = {'linear kernel':kernel.linear_kernel,
'periodical kernel':kernel.periodic_kernel,
'gauss kernel':kernel.gauss_kernel,
'linear+gauss kernel': kernel.linear_gauss_kernel}
for i,(kernel_name, kernel_func) in enumerate(kernel_dict.items()):
plt.subplot(2,2,i+1)
# create class of gaussian process classifier
gclf = gp_classifier(kernel_func=kernel_func)
gclf.fit(X,labels)
x = np.linspace(-4,4,101)
y = np.linspace(-4,5,101)
xx, yy = np.meshgrid(x,y)
zz = gclf.predict_proba(np.vstack([xx.ravel(),yy.ravel()]).T)
ctf = plt.contourf(xx,yy,zz[:,1].reshape(xx.shape),cmap='coolwarm')
plt.colorbar(ctf)
# Plot toy data
for label, col in zip(np.unique(labels),['blue','red']):
mask = np.where(labels == label)[0]
plt.scatter(X[mask,0],X[mask,1],color=col)
plt.xlim(-4,4)
plt.ylim(-4,5)
plt.title('{} log mlh = {:.2f}'.format(kernel_name,gclf.get_marginal_likelihood()),
fontsize=14)
plt.show()
Gaussian kernel has highest marginal log likelihood among them. Periodical kernel seems to face difficulty to classify this data set.