01. The model¶
Let's say the number of cluster is $K$ here. At first, let me introduce $K$-dimentional binary random variable z which is 1-of-$K$ representation. Therefore value of $z$ satisfy $z \in \{0,1\}$ and $\sum_k z_k = 1$. This $z$ is drawn from categorical distribution $Cat(z|\pi)$. Therefore, probability of $z$ can be written as following,
$$p(z) = \prod_{k=1}^{K}\pi_k^{z_k}$$
From which cluster the observation comes from is determined by this $z_k$, hence probability of $p(x|z)$ can be writtein as below,
Joint distribution of $x$ and $z$ is ,
$$\begin{eqnarray}p(x,z) &=& p(x|z)p(z)\\ &=&\prod_k^K \{\pi_kN(x|\mu_k,\Sigma_k)\}^{z_k}\end{eqnarray}$$Thus, marginal distirbution of x is ,
$$\begin{eqnarray}p(x) &=& \sum_z p(x,z)\\ &=&\sum_k^K \pi_kN(x|\mu_k,\Sigma_k)\end{eqnarray}$$You can also derive posterior distribution of $z$,
$$\begin{eqnarray}p(z|x) &=& \frac{p(x,z)}{p(x)}\\ &=&\frac{\prod_{k=1}^{K}\{\pi_kN(x|\mu_k,\Sigma_k)\}^{z_k}}{\sum_k^K \pi_kN(x|\mu_k,\Sigma_k)}\end{eqnarray}$$With graphical model, this can be expressed like,
02. EM (Expectation-maximization mehod) algorithm¶
In order to get optimized parameter, we will apply EM(Expectation maximization) algorithm in this article. The steps of EM(expectation maximization) is as following,
- Initialize the means $\mu$, and $\Sigma$ and $pi$, and compute log likelihood of $p(X)$.
Estep. Evaluate following(which is called "responsibilities"),
$$\gamma(z_{nk}) = \frac{\pi_k N(x_n|\mu_k,\Sigma_k)}{\sum_k^K \pi_k N(x_n|\mu_k,\Sigma_k}$$
Mstep. Update $\mu$, and $\Sigma$ and $pi$ with resposibilities we got previous step.
- Evaluate the log likelihood of $p(X)$. And check the convergence of the log likelihood.
03. Implementation¶
Now the time to implement the gausian mixture model with EM (Expectation-maximization) model. At first, we have 300 observation from 2 different cluster. One has $\mu = (1,1)^T$ and $\Sigma = \begin{pmatrix} 0.4 & -0.3 \\ -0.3 & 0.4 \end{pmatrix}$, the other has $\mu = (-2,3)^T$ and $\Sigma = \begin{pmatrix} 0.8 & 0.2 \\ 0.2 & 0.8 \end{pmatrix}$.
from scipy.stats import multivariate_normal
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn
obs_mean = [[1,1],
[-2,3]]
obs_cov = [
[[0.4,-0.3],
[-0.3,0.4]],
[[0.8,0.2],
[0.2,0.8]]
]
obs_num = [100,200]
obs_data = []
for mean, cov,num in zip(obs_mean,obs_cov,obs_num):
obs_data.append(multivariate_normal.rvs(mean=mean,
cov=cov,
size=num,
random_state=42))
obs_data = np.concatenate([obs_data[0],obs_data[1]])
obs_data.shape
Observations look like this :)
plt.scatter(x=obs_data[:,0],y=obs_data[:,1])
plt.title('Observations',fontsize=14)
plt.show()
The following is the implementation of EM algorithm.
# number of cluster
k_num = 2
# Initial value
mean = np.array([[1,0],
[-1,0]])
cov = np.array([[[0.2,0],
[0,0.2]],
[[0.2,0],
[0,0.2]]])
pi = [0.5,0.5]
# Threshold of optimaization
eps = 1e-8
# max number of iteration
max_iter = 100
# log likelihood
ln_like = 0
# EM algorithm
for i in range(max_iter):
# compute log likelihood [ok]
ln_p_X = np.array([
np.log(
np.array(
[pi[k] * (multivariate_normal.pdf(x=obs_data[i],
mean=mean[k],
cov=cov[k]))
for k in range(k_num)]
).sum()
)
for i in range(obs_data.shape[0])
]).sum()
# E Step compute posterior dist of z
z_pos = np.empty((len(obs_data),k_num))
for i in range(len(obs_data)):
# calculator denominator
denom = np.array(
[pi[k] * multivariate_normal.pdf(x=obs_data[i],
mean=mean[k],
cov=cov[k])
for k in range(k_num)]
).sum()
# posterior probability
z_pos[i] = np.array(
[ pi[k] * multivariate_normal.pdf(x=obs_data[i],
mean=mean[k],
cov=cov[k])/denom
for k in range(k_num)]
)
#M Step update mean, covariance, pi
pi = z_pos.mean(axis=0)
cov = np.array(
[np.array(
[z_pos[i,k] * (((obs_data[i] - mean[k])[:,np.newaxis] ) @ ((obs_data[i] - mean[k])[np.newaxis,:]))
for i in range(len(obs_data))]
).sum(axis=0) / z_pos.sum(axis=0)[k]
for k in range(k_num)]
)
mean = np.array(
[((obs_data * z_pos[:,k][:,np.newaxis]).sum(axis=0)) / z_pos.sum(axis=0)[k]
for k in range(k_num)]
)
# Check whether it was converged
ln_p_X_after = np.array([
np.log(
np.array(
[pi[k] * (multivariate_normal.pdf(x=obs_data[i],
mean=mean[k],
cov=cov[k]))
for k in range(k_num)]
).sum()
)
for i in range(obs_data.shape[0])
]).sum()
# If it's getting being coverged, wrap up em algorithm
if abs(ln_p_X - ln_p_X_after) < eps:
break
The followings are the result of EM algorithm. We can say it's somehow close to parameter of population we set :)
print('mean :\n',mean)
print('covariance matrix :\n',cov)
print('pi :\n',pi)