0. Introduction of variational approximation¶
"Variational approximation" method is one of technique to approximate intractable "posterior distribution" in Baysian inference and machine learning. I will show you basic concept of "variational approximation" here.
First of all, let $p(x_1,x_2)$ be intractable probability density function. Because of intractability, approximation is required. Hence we consider $q(x)$ as approximation function. Then, we have to compare $p(x)$ and $q(x)$. When it comes to comparison of two different distribution there is useful method called "Kullback-Leibler divergence (KL divergence)". Should you be completely first to "KL divergence", you can check here (KL(Kullback-Leibler) divergence). So, what we have to do is minimize KL divergence. However you might wanna say that even though $p(x)$ is itractable, minimize KL divergence nontheless, that defys logic ! Calm down. There is technique to rectify this adversity :)
1. Assumption of independency between random variables¶
In "Variational Approximation", we assume random variables are independent each other. To be precise, we assume $p(x_1,x_2) \approx q(x_1,x_2) = q(x_1),q(x_2)$. Then because of independency of random variable, folloing tactic can be applied.
- Initialize $q(x_2)$ randomly.
- While assuming $q(x_2)$ should be somewhat of function, compute $q(x_1)$ which minimize KL-divergence.
- While assuming $q(x_1)$ should be the one we obtained above "2", compute $q(x_2)$ which minimize KL-divergence.
- Iterate "2" and "3" adequately.
2. Derive formula of variational approximation¶
Let me show you how to compute "2". Just in case, $q(x_2)$ is assumed as some kind of function, and what we want to know is what the $q(x_1)$ is . Now it's time to compute KL divergence between $p(x_1,x_2)$ and $q(x_1)q(x_2)$.
$$\begin{eqnarray}KL(p(x_1,x_2)||q(x_1, x_2)) &=& -<\log\left(\frac{p(x_1,x_2)}{q(x_1)q(x_2)}\right)>_{q(x_1)q(x_2)}\\ &=&-<\log p(x_1,x_2) - \log q(x_1) - \log q(x_2)>_{q(x_1)q(x_2)}\\ &=&-<<\log p(x_1,x_2)>_{q(x_2)} - \log q(x_1)>_{q(x_1)} +\ const\\ &=&-<\log\left(\frac{exp\left\{<\log p(x_1,x_2)>_{q(x_2)}\right\}}{q(x_1)}\right)>_{q(x_1)} +\ const\\ &=&KL(exp\left\{<\log p(x_1,x_2)>\right\}_{q(x_2)}||q(x_1)) +\ const \end{eqnarray}$$For the sake of minimization of KL-divergence, $q(x_1)$ should be,
$$q(x_1) = exp\left\{<\log p(x_1,x_2)>_{q(x_2)}\right\} + exp(c)$$
Therefore,
$$\log q(x_1) = <\log p(x_1,x_2)>_{q(x_2)} + c$$
Needless to say, case of $q(x_2)$ is following likewise,
$$\log q(x_2) = <\log p(x_1,x_2)>_{q(x_1)} + c$$
Actually above equation is somewhat of formula to derive approximate distribution.
No comments:
Post a Comment