0. Approximation with gibbs sampling¶
We've already create model of Poisson mixture model and sampled from it.
x∼p(x|s,λ,π)
Now what we wanna know is posterior distribution p(s,λ,π|X), when data X=(x1,x2,⋯,xN) is observed. Nonetheless posterior distribution is itractable. Thereby approximation is required. On this article we're gonna utilize gibbs samping. If you wanna wrapp your head around gibbs sampling, you can check this link. Gentle introduction to Gibbs Sampling.
In mixture model, it's well known that sampling by sperating latent variable and parameter will give you simple distribution enough to sample from it. Therefore according to the following steps, gibbs sampling will be executed.
si∼p(s|X,λi−1,πi−1)
1. Compute posterior distribution to sample s¶
From above (1), what we know so as to draw sample is posterior distribution of p(s | X,λ,π) and p(π,λ | s,X). Let's start off with p(s | X,λ,π) !
p(s | X,λ,π)=p(s,X,λ,π)p(X,λ,π)=p(X|s,λ)p(s|π)p(λ)p(π)p(X,λ,π)=p(X|s,λ)p(s|π)p(λ)p(π)p(X,| λ,π)p(λ)p(π)∝p(X|s,λ)p(s|π)=N∏n=1p(xn|sn,λ)p(sn|π)Since logxn! is already abserved. And snk is one hot vector. We can treat it as constant.Therefore
logp(xn|sn,λ)=K∑ksnk(xnlogλk−λk)+ const
Therefore, log{p(xn|sn,λ)p(sn|π)}=K∑ksnk(xnlogλk−λk+logπk)+ const
Since ∑Kksnk=1, we can tell p(xn|sn,λ)p(sn|π) is categorical distribution as following, sn∼Cat(sn|ηn)
2. logsumexp¶
In this section, I will share slick technique to implement above categorical distribution.
First of all, Since ηnk∝exp{xnlogλk−λk+logπk}, It seems that we have to compute this equation as following,
However, since it's exponential, the value exp can be enormous value. At the time threre is possibility overflow in computation. Thus we might wanna think altenative mehod to compute η. Let's say there is normalization constant "const". Then we can denote η as following,
η=exp{xnlogλk−λk+logπk+ const}=exp{const}exp{xnlogλk−λk+logπk}Then from equation (2), exp{const} can be captured as below, exp{const}=K∑kexp{xnlogλk−λk+logπk}const=logK∑kexp{xnlogλk−λk+logπk}
logK∑kexp(xk)=logK∑kexp(xk−xmax)exp(xmax)=log{K∑kexp(xk−xmax)}+xmax
That's how we can avoid overflow by calculating constant denoted in (4) instead of (2).
On the side note, for better understanding, the following is exactly what we're doing. let's say there is vector (e1,e2,e3,e4). So as to make summatin is 1,
Instead of
(e1e1+e2+e3+e4,e2e1+e2+e3+e4,e3e1+e2+e3+e4,e4e1+e2+e3+e4))
3. Compute posterior distribution to sample λ and π¶
Let's think about gibbs sampling of λ and π,
p(π,λ|X,s)∝p(π)p(λ)p(X|s,λ)p(s|π)From aboeve equation (5) when it comes to λ and π, we can sample seperately. We are off to good start. Let's look at π first.
p(π)p(S|π)=p(π)N∏np(sn|π)log(p(π)p(S|π))=K∑k{(αk−1+N∑nsnk)logπk}Therefore π is drawn from Dirichlet Distribution. πi∼Dir(π|ˆα)
Next, let's get into gibbs sampling of λ. p(λ)p(X|s,λ)=K∏kλα−1ke−bλk+N∏nK∏k{λxnkxn!e−λk}Snklog(p(λ)p(X|s,λ))=K∑k{((a−1)+N∑nsnkxn)logλk−(b+N∑nsnkλk)}
So far we've prepared all required posterior distribution. We're ready to implement Poisson Mixture Model. However it's already quite long enoiugh to be an article. Hence I'm gonna write implementation on another blog :)
No comments:
Post a Comment