Monday, July 16, 2018

Poisson Mixture Model ②

This article is continuation from Poisson mixture model①. In this article, we will cope with applying gibbs sampling to posterior distribution of Poisson Mixture Model.

0. Approximation with gibbs sampling

We've already create model of Poisson mixture model and sampled from it.

$$x \sim p(x | s, \lambda, \pi)$$


Now what we wanna know is posterior distribution $p(s,\lambda,\pi|X)$, when data $X = (x_1,x_2, \cdots, x_N)$ is observed. Nonetheless posterior distribution is itractable. Thereby approximation is required. On this article we're gonna utilize gibbs samping. If you wanna wrapp your head around gibbs sampling, you can check this link. Gentle introduction to Gibbs Sampling.
In mixture model, it's well known that sampling by sperating latent variable and parameter will give you simple distribution enough to sample from it. Therefore according to the following steps, gibbs sampling will be executed.
$$s^i \sim p(s|X,\lambda_{i-1},\pi_{i-1})$$ $$\lambda^i, \pi^i \sim p(\lambda,\pi | X,s^{i}) \tag{1}$$

1. Compute posterior distribution to sample $s$

From above $(1)$, what we know so as to draw sample is posterior distribution of $p(s\ |\ X, \lambda, \pi)$ and $p(\pi,\lambda\ |\ s, X)$. Let's start off with $p(s\ |\ X, \lambda, \pi)$ !

$$\begin{eqnarray}p(s\ |\ X, \lambda, \pi) &=& \frac{p(s, X, \lambda, \pi)}{p(X, \lambda, \pi)}\\ \\ &=& \frac{p(X|s,\lambda)p(s|\pi)p(\lambda)p(\pi)}{p(X, \lambda, \pi)}\\ \\ &=& \frac{p(X|s,\lambda)p(s|\pi)p(\lambda)p(\pi)}{p(X, |\ \lambda, \pi)p(\lambda)p(\pi)}\\ \\ &\propto& {p(X|s,\lambda)p(s|\pi)}\\ \\ &=& \prod^{N}_{n=1}{p(x_n|s_n,\lambda)p(s_n|\pi)}\\\ \end{eqnarray}$$$$\begin{eqnarray}\log p(x_n|s_n,\lambda) &=& \sum^{K}_{k} s_{nk}(x_n\log \lambda_k - \log x_n ! - \lambda_k) \end{eqnarray}$$

Since $\log x_n ! $ is already abserved. And $s_{nk}$ is one hot vector. We can treat it as constant.Therefore
$$\begin{eqnarray}\log p(x_n|s_n,\lambda) &=& \sum^{K}_{k} s_{nk}(x_n\log \lambda_k - \lambda_k) +\ const\end{eqnarray}$$ From model we set, $p(s_n|\pi)$ is, $$\log p(s_n|\pi) = \sum_k^Ks_{nk}\log \pi_k$$

Therefore, $$\begin{eqnarray}\log \{p(x_n|s_n,\lambda)p(s_n|\pi)\} = \sum^{K}_{k} s_{nk}(x_n\log \lambda_k - \lambda_k + \log \pi_k) +\ const\end{eqnarray}$$

Since $ \sum^{K}_{k} s_{nk} = 1$, we can tell $p(x_n|s_n,\lambda)p(s_n|\pi)$ is categorical distribution as following, $$s_n \sim Cat(s_n|\eta_n)$$ $$ s. t. $$ $$\eta_{nk} \propto exp\{x_n\log \lambda_k - \lambda_k + \log \pi_k \}\ s.\ t. \sum_k^K \eta_{nk} =1$$

2. logsumexp

In this section, I will share slick technique to implement above categorical distribution.
First of all, Since $\eta_{nk} \propto exp\{x_n\log \lambda_k - \lambda_k + \log \pi_k \}$, It seems that we have to compute this equation as following,

$$\eta_{nk} = \frac{exp\{x_n\log \lambda_k - \lambda_k + \log \pi_k \}}{\sum_k^K exp\{x_n\log \lambda_k - \lambda_k + \log \pi_k \}}\tag{2}$$

However, since it's exponential, the value $exp$ can be enormous value. At the time threre is possibility overflow in computation.  Thus we might wanna think altenative mehod to compute $\eta$. Let's say there is normalization constant "$const$". Then we can denote $\eta$ as following,

$$\begin{eqnarray}\eta &=& exp\{x_n\log \lambda_k - \lambda_k + \log \pi_k +\ const\} \\ &=& exp\{const\}exp\{x_n\log \lambda_k - \lambda_k + \log \pi_k\}\end{eqnarray}\tag{3}$$

Then from equation $(2)$, $exp\{const\}$ can be captured as below, $$\begin{eqnarray}exp\{const\} &=& \sum_k^K exp\{x_n\log \lambda_k - \lambda_k + \log \pi_k \}\\ const &=& \log \sum_k^K exp\{x_n\log \lambda_k - \lambda_k + \log \pi_k \}\end{eqnarray}\tag{4}$$ So, now we can apply "logsumexp" technique. "logsumexp" technique is folowing.
$$\begin{eqnarray}\log\sum^K_k \exp(x_k) &=& \log\sum^K_k \exp(x_k - x_{max})\exp(x_{max}) \\ &=& \log\left\{\sum^K_k \exp(x_k - x_{max})\right\}+ x_{max}\end{eqnarray}$$

That's how we can avoid overflow by calculating constant denoted in $(4)$ instead of $(2)$.
On the side note, for better understanding, the following is exactly what we're doing. let's say there is vector $(e^1, e^2, e^3, e^4)$. So as to make summatin is 1,

$$(e^{1-log(e^1 + e^2 + e^3+ e^4)},e^{2-log(e^1 + e^2 + e^3+ e^4)},e^{3-log(e^1 + e^2 + e^3+ e^4)},e^{4-log(e^1 + e^2 + e^3+ e^4}))$$


Instead of $$(\frac{e^{1}}{e^1 + e^2 + e^3+ e^4},\frac{e^{2}}{e^1 + e^2 + e^3+ e^4},\frac{e^3}{e^1 + e^2 + e^3+ e^4},\frac{e^4}{e^1 + e^2 + e^3+ e^4}))$$

3. Compute posterior distribution to sample $\lambda$ and $\pi$

Let's think about gibbs sampling of $\lambda$ and $\pi$,

$$\begin{eqnarray}p(\pi, \lambda| X,s) &\propto& p(\pi)p(\lambda)p(X|s,\lambda)p(s|\pi)\end{eqnarray}\tag{5}$$

From aboeve equation $(5)$ when it comes to $\lambda$ and $\pi$, we can sample seperately. We are off to good start. Let's look at $\pi$ first.

$$\begin{eqnarray}p(\pi)p(S|\pi) &=& p(\pi)\prod^N_np(s_n|\pi) \\ \log (p(\pi)p(S|\pi))&=& \sum^K_k\left\{(\alpha_{k-1} + \sum^N_ns_{nk})\log \pi_k\right\}\end{eqnarray}$$

Therefore $\pi$ is drawn from Dirichlet Distribution. $$\pi_i \sim Dir(\pi|\hat{\alpha})$$ where $$\hat{\alpha_k} = \alpha_k + \sum^N_ns_{nk}$$

Next, let's get into gibbs sampling of $\lambda$. $$\begin{eqnarray}p(\lambda)p(X|s,\lambda) &=& \prod ^K_k \lambda^{\alpha -1}_k e^{-b\lambda_k} + \prod^N_n\prod^K_k \left\{\frac{\lambda^{x_n}_{k}}{x_n!}e^{-\lambda_k}\right\}^{S_{nk}}\\ \log(p(\lambda)p(X|s,\lambda) ) &=& \sum^K_k\left\{((a-1)+\sum^N_ns_{nk}x_n)\log\lambda_k - (b + \sum^N_n s_{nk}\lambda_k)\right\}\end{eqnarray}$$ Therefore $\lambda_i$ is drawn from gammbda distribution. $$\lambda_i \sim p(\lambda|\hat{a},\hat{b})$$ where $$\hat{a} = a+\sum^N_n S_{nk}x_n,\ \ \hat{b} = b + \sum^N_nS_{nk}$$

So far we've prepared all required posterior distribution. We're ready to implement Poisson Mixture Model. However it's already quite long enoiugh to be an article. Hence I'm gonna write implementation on another blog :)

No comments:

Post a Comment