0. Introduction of variational approximation¶
"Variational approximation" method is one of technique to approximate intractable "posterior distribution" in Baysian inference and machine learning. I will show you basic concept of "variational approximation" here.
First of all, let p(x1,x2) be intractable probability density function. Because of intractability, approximation is required. Hence we consider q(x) as approximation function. Then, we have to compare p(x) and q(x). When it comes to comparison of two different distribution there is useful method called "Kullback-Leibler divergence (KL divergence)". Should you be completely first to "KL divergence", you can check here (KL(Kullback-Leibler) divergence). So, what we have to do is minimize KL divergence. However you might wanna say that even though p(x) is itractable, minimize KL divergence nontheless, that defys logic ! Calm down. There is technique to rectify this adversity :)
1. Assumption of independency between random variables¶
In "Variational Approximation", we assume random variables are independent each other. To be precise, we assume p(x1,x2)≈q(x1,x2)=q(x1),q(x2). Then because of independency of random variable, folloing tactic can be applied.
- Initialize q(x2) randomly.
- While assuming q(x2) should be somewhat of function, compute q(x1) which minimize KL-divergence.
- While assuming q(x1) should be the one we obtained above "2", compute q(x2) which minimize KL-divergence.
- Iterate "2" and "3" adequately.
2. Derive formula of variational approximation¶
Let me show you how to compute "2". Just in case, q(x2) is assumed as some kind of function, and what we want to know is what the q(x1) is . Now it's time to compute KL divergence between p(x1,x2) and q(x1)q(x2).
KL(p(x1,x2)||q(x1,x2))=−<log(p(x1,x2)q(x1)q(x2))>q(x1)q(x2)=−<logp(x1,x2)−logq(x1)−logq(x2)>q(x1)q(x2)=−<<logp(x1,x2)>q(x2)−logq(x1)>q(x1)+ const=−<log(exp{<logp(x1,x2)>q(x2)}q(x1))>q(x1)+ const=KL(exp{<logp(x1,x2)>}q(x2)||q(x1))+ constFor the sake of minimization of KL-divergence, q(x1) should be,
q(x1)=exp{<logp(x1,x2)>q(x2)}+exp(c)
Therefore,
logq(x1)=<logp(x1,x2)>q(x2)+c
Needless to say, case of q(x2) is following likewise,
logq(x2)=<logp(x1,x2)>q(x1)+c
Actually above equation is somewhat of formula to derive approximate distribution.
No comments:
Post a Comment