Sunday, July 14, 2019

Probabilistic PCA (principal component analysis)

In this article, I will share about probabilistic PCA. At glance, it looks like linear regression model. However it contains latent variable in the model. It's quite intriguing:) For instance, it can be used for compression, interpolation of missing values. Tentatively, for parameter estimation, I will apply maximum likelihood estimation, specifically, EM(Expectation Maximization) algorithm in this article. At last, I will apply probabilistic PCA to Olivetti face data to compress. It's gonna be only 0.4 % of original data! And we will see how it works. I will be glad if you have fun out of this article :)

0. What is probabilistic PCA ??

The model we are gonna discuss is basically, linear-Gaussian model in which all of the marginal and conditional distributions are gaussian. First of all, we can introduce explicit latent variable z corresponding to the principal-component subspace.

$$p(z) = N(z|0,I)$$

For observed variable $x$, conditioned on the value of the latent variable $z$ is also Gaussian as following in which the mean of $x$ is linear function of $z$ governed by $W$ and $\mu$.

$$p(x|z) = N(x|Wz+\mu, \sigma^2I)$$

From that, marginal distribution can be computed as following,

$$ p(x) = N(x|\mu, \sigma^2I + WW^T)$$

For the estimation of parameters, I will apply EM algorithm in this article. In E-step, in order to vanish the difference between marginal log likelihood and lower boundary, $q(z)$ will be updated by following,

$$< z_n> = M^{-1}W^T(x_n-\mu)$$$$< z_n z_n^T> = \sigma^2M^{-1} + <z_n><z_n>^T$$$$where\ \ \ M = \sigma^{2}I + WW^T$$

In M step, lower boundary is gonna be maximized with respect to $W$ and $\sigma^2$.

1. Implementation

The following is basic implementation of probabilistic PCA with EM algorithm. I used numpy module.

In [1]:
import numpy as np
import numpy.linalg as LA
In [2]:
def EM(X, dim_m,max_iter=500):
    """ Perform EM algorithm for probabilistic PCA
    parameters
    ---------------
    X : assumed design matrix
    dim_d : dimension of 
    """
    dim_d = X.shape[1]
    
    # Initialize parameters
    W = np.ones((dim_d,dim_m))
    mu = np.ones(dim_d)
    sigma2 = 1
    exp_z = np.zeros((obs_data.shape[0],dim_d))
    exp_z_zt = np.zeros((obs_data.shape[0],dim_d,dim_d))
    
    # In maximum likelihood setting 
    # mu is set with mean of x
    mu = obs_data.mean(axis=0)

    for _ in range(max_iter):
        # E step
        M = W.T @ W + sigma2*np.eye(dim_m)
        exp_z = np.array([
            (LA.inv(M)@W.T@(X[i] - mu).reshape(-1,1)).ravel()
            for i in range(X.shape[0])
        ])
        exp_z_zt = np.array([
            sigma2 * LA.inv(M) + exp_z[i].reshape(-1,1)@exp_z[i].reshape(1,-1)
            for i in range(X.shape[0])
        ])

        #M step
        sum1 = np.array([
            (X[i] - mu).reshape(-1,1)@exp_z[i].reshape(1,-1)
            for i in range(X.shape[0])
        ]).sum(axis=0)

        W = sum1@LA.inv(exp_z_zt.sum(axis=0))
        sigma2 = (1/(dim_d * dim_m)) * np.array([
            (X[i] - mu).reshape(1,-1)@(X[i] - mu).reshape(-1,1) -\
            2*exp_z[i].reshape(1,-1)@W.T@(X[i]-mu).reshape(-1,1) + \
            np.diag(exp_z_zt[i]@W.T@W)
            for i in range(X.shape[0])
        ]).sum()
        
    return exp_z, W, mu, sigma2

2. Apply to Olivetti face data set

In this section, we will apply probabilistic PCA to Olivetti face data set in order to compress image data. The data utilized here can be obtained from following link.
https://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html

In [3]:
from sklearn import datasets
import matplotlib.pyplot as plt
%matplotlib inline
In [4]:
# data preparation
data = datasets.fetch_olivetti_faces()
oliv_np = data.data
X = np.array([oliv_np[i*10,:] for i in range(16)])
In [5]:
print('Original picture 64x64')
plt.gray()
plt.figure(figsize=(12,8))

for i, x in enumerate(X):
    plt.subplot(4,4,i+1)
    plt.imshow(x.reshape(64,64))
    plt.tick_params(labelbottom=False,
                labelleft=False,
                labelright=False,
                labeltop=False)
    
plt.show()

exp_z, W, mu, sigma2 =  EM(X=X, dim_m=2*2)
print('Compressed picture with 4x4 latent variable')
plt.figure(figsize=(12,8))
for i, qz in enumerate(exp_z):
    plt.subplot(4,4,i+1)
    plt.imshow((W@qz + mu).reshape(64,64))
    plt.tick_params(labelbottom=False,
                labelleft=False,
                labelright=False,
                labeltop=False)
    
plt.show()
Original picture 64x64
<Figure size 432x288 with 0 Axes>
Compressed picture with 4x4 latent variable

The upper one is original data set whose dimension is 4096. Lower pictures are compressed version of image data which has only 16 dimension which is only 0.4% of original data! You can tell some features or noize were smoothed:) You can also try another dimension to check how it works.

5. Reference