0. What is probabilistic PCA ??¶
The model we are gonna discuss is basically, linear-Gaussian model in which all of the marginal and conditional distributions are gaussian. First of all, we can introduce explicit latent variable z corresponding to the principal-component subspace.
$$p(z) = N(z|0,I)$$For observed variable $x$, conditioned on the value of the latent variable $z$ is also Gaussian as following in which the mean of $x$ is linear function of $z$ governed by $W$ and $\mu$.
$$p(x|z) = N(x|Wz+\mu, \sigma^2I)$$From that, marginal distribution can be computed as following,
$$ p(x) = N(x|\mu, \sigma^2I + WW^T)$$For the estimation of parameters, I will apply EM algorithm in this article. In E-step, in order to vanish the difference between marginal log likelihood and lower boundary, $q(z)$ will be updated by following,
$$< z_n> = M^{-1}W^T(x_n-\mu)$$$$< z_n z_n^T> = \sigma^2M^{-1} + <z_n><z_n>^T$$$$where\ \ \ M = \sigma^{2}I + WW^T$$In M step, lower boundary is gonna be maximized with respect to $W$ and $\sigma^2$.
1. Implementation¶
The following is basic implementation of probabilistic PCA with EM algorithm. I used numpy module.
import numpy as np
import numpy.linalg as LA
def EM(X, dim_m,max_iter=500):
""" Perform EM algorithm for probabilistic PCA
parameters
---------------
X : assumed design matrix
dim_d : dimension of
"""
dim_d = X.shape[1]
# Initialize parameters
W = np.ones((dim_d,dim_m))
mu = np.ones(dim_d)
sigma2 = 1
exp_z = np.zeros((obs_data.shape[0],dim_d))
exp_z_zt = np.zeros((obs_data.shape[0],dim_d,dim_d))
# In maximum likelihood setting
# mu is set with mean of x
mu = obs_data.mean(axis=0)
for _ in range(max_iter):
# E step
M = W.T @ W + sigma2*np.eye(dim_m)
exp_z = np.array([
(LA.inv(M)@W.T@(X[i] - mu).reshape(-1,1)).ravel()
for i in range(X.shape[0])
])
exp_z_zt = np.array([
sigma2 * LA.inv(M) + exp_z[i].reshape(-1,1)@exp_z[i].reshape(1,-1)
for i in range(X.shape[0])
])
#M step
sum1 = np.array([
(X[i] - mu).reshape(-1,1)@exp_z[i].reshape(1,-1)
for i in range(X.shape[0])
]).sum(axis=0)
W = sum1@LA.inv(exp_z_zt.sum(axis=0))
sigma2 = (1/(dim_d * dim_m)) * np.array([
(X[i] - mu).reshape(1,-1)@(X[i] - mu).reshape(-1,1) -\
2*exp_z[i].reshape(1,-1)@W.T@(X[i]-mu).reshape(-1,1) + \
np.diag(exp_z_zt[i]@W.T@W)
for i in range(X.shape[0])
]).sum()
return exp_z, W, mu, sigma2
2. Apply to Olivetti face data set¶
In this section, we will apply probabilistic PCA to Olivetti face data set in order to compress image data. The data utilized here can be obtained from following link.
https://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html
from sklearn import datasets
import matplotlib.pyplot as plt
%matplotlib inline
# data preparation
data = datasets.fetch_olivetti_faces()
oliv_np = data.data
X = np.array([oliv_np[i*10,:] for i in range(16)])
print('Original picture 64x64')
plt.gray()
plt.figure(figsize=(12,8))
for i, x in enumerate(X):
plt.subplot(4,4,i+1)
plt.imshow(x.reshape(64,64))
plt.tick_params(labelbottom=False,
labelleft=False,
labelright=False,
labeltop=False)
plt.show()
exp_z, W, mu, sigma2 = EM(X=X, dim_m=2*2)
print('Compressed picture with 4x4 latent variable')
plt.figure(figsize=(12,8))
for i, qz in enumerate(exp_z):
plt.subplot(4,4,i+1)
plt.imshow((W@qz + mu).reshape(64,64))
plt.tick_params(labelbottom=False,
labelleft=False,
labelright=False,
labeltop=False)
plt.show()
The upper one is original data set whose dimension is 4096. Lower pictures are compressed version of image data which has only 16 dimension which is only 0.4% of original data! You can tell some features or noize were smoothed:) You can also try another dimension to check how it works.
5. Reference¶
- ベイズ推論による機械学習 須山敦志
- Recognition and Machine Learning Christopher Bishop