0. Mathmatical analysis¶
As you may know, the following is 2-dimentional normal distribution.
$$\begin{eqnarray}p(x_1,x_2|\mu_1,\mu_2,\Lambda^{-1}) &=& \frac{1}{\sqrt{2\pi^2|\Lambda^{-1}|}}exp\left\{-\frac{1}{2}\left(\begin{array}{c}x_1 - \mu_1 \\x_2 - \mu_2\end{array}\right)^T\begin{pmatrix}\lambda_{11} & \lambda_{12}\\ \lambda_{21} & \lambda_{22
} \end{pmatrix}\left(\begin{array}{c}x_1 - \mu_1 \\x_2 - \mu_2\end{array}\right)\right\} \\
&\propto& exp\left\{-\frac{1}{2}\left(\begin{array}{c}x_1 - \mu_1 \\x_2 - \mu_2\end{array}\right)^T\begin{pmatrix}\lambda_{11} & \lambda_{12}\\ \lambda_{21} & \lambda_{22
} \end{pmatrix}\left(\begin{array}{c}x_1 - \mu_1 \\x_2 - \mu_2\end{array}\right)\right\}\end{eqnarray}$$
Since "2-dimentional gaussian distibution" has two variable, we can assume $p(x_1,x_2|\mu_1,\mu_2,\Lambda^{-1}) \approx q(x_1,x_2) = q(x_1)q(x_2)$.
Then, we can apply formula which is derived in Variational Approximaton①.
$$\begin{eqnarray}\log q(x_1) &=& <\log p(x_1,x_2)>_q(x_2) +c \\ &=& \left< -\frac{1}{2}\left(\begin{array}{c} x_1-\mu_1 \\ x_2-\mu_2 \end{array} \right)^T\begin{pmatrix} \lambda_{11}& \lambda_{12} \\ \lambda_{21}& \lambda_{22} \end{pmatrix}\left(\begin{array}{c} x_1-\mu_1 \\ x_2-\mu_2 \end{array} \right)\right>_{q(x_2)}\ +\ c \\ &=&\left< -\frac{1}{2}\{\lambda_{11}x_1^2 - 2x_1(\lambda_{11}\mu_1-\lambda_{12}(x_2- \mu_2)\}\right>_{q(x_2)} + c \\ &=& -\frac{1}{2}\{\lambda_{11}x_1^2 - 2x_1(\lambda_{11}\mu_1-\lambda_{12}(<x_2>_{x_2}- \mu_2)\} + c\end{eqnarray}$$
We can tell this is quadratic function with respect to $x_1$ which is concave downward. Therefore, $q(x_1)$ can be denoted as below.
$$q(x_1) = N(x_1|m_1,\Lambda^{-1}_{11})$$
$$such\ that$$
$$m_1 = \mu_1 - \Lambda^{-1}_{11}\Lambda_{12}(<x_2> - \mu_2) \tag{1}
$$
Similarly, $q(x_2)$ is captured as following.
$$q(x_2) = N(x_2|m_2,\Lambda^{-1}_{22})$$$$such\ that$$$$m_2 = \mu_2 - \Lambda^{-1}_{22}\Lambda_{21}(<x_1> - \mu_2) \tag{2}$$1. Implementation¶
From now on, I'll show you the implementation of variational_approximation of "2-dimentional gaussian distribution". I hope you'll enjoy it :)
from scipy.stats import norm
from scipy.stats import multivariate_normal
from scipy.stats import entropy
import numpy as np
import matplotlib.pyplot as plt
% matplotlib inline
At fist we prepare "2-dimentional gaussian distribution" which will be approximated with variational approximation. This time, mean vector is $\left(\begin{array}{c} 0\\ 0 \end{array} \right)$, co-variance matix is $\begin{pmatrix}1 & 0.8 \\0.8& 1\end{pmatrix}$. We can see how it looks like as below.
x = np.linspace(-4,4,101)
y = np.linspace(-4,4,101)
x_mesh,y_mesh = np.meshgrid(x,y)
org_mean=np.array([0,0])
org_cov_matrix = np.array([[1,0.8],
              [0.8,1]])
org_z = np.empty_like(x_mesh)
for i in range(x.shape[0]):
    for j in range(y.shape[0]):
        org_z[i,j] = multivariate_normal.pdf(x =(x[i],y[j]), mean=org_mean,
                                             cov=org_cov_matrix)
plt.contour(x_mesh,y_mesh,org_z,levels=[0.1],colors='blue')
As we discussed before, mean vector of $q(x_2)$ should be given at first. Tentatively we set '-3' for mean of $q(x_2)$. Before that, we should be on the same page on what $q(x_1)q(x_2)$ looks like. Let's think about two following normal distribution. $$N(x|\mu_1,\sigma^2_1) = \frac{1}{\sqrt{2\pi}\sigma^2_1}exp\left\{-\frac{(x-\mu_1)^2}{2\sigma^2_1}\right\}$$ $$N(x|\mu_2,\sigma^2_2) = \frac{1}{\sqrt{2\pi}\sigma^2_2}exp\left\{-\frac{(x-\mu_2)^2}{2\sigma^2_2}\right\}$$ For stepwise understanding, let's see approximate distibution at one time iteration :)
q2_mean = -4
# First iteration  :)
# Obtain q_1(x)
q1_mean = org_mean[0] - ((1/org_cov_matrix[0,0]) * 
                         org_cov_matrix[0,1]) * (q2_mean - org_mean[1])
q1_variance = org_cov_matrix[0,0]
# Obtain q_2(x)
q2_mean = org_mean[1] - ((1/org_cov_matrix[1,1]) * 
                         org_cov_matrix[1,0]) * (q1_mean - org_mean[0])
q2_variance = org_cov_matrix[1,1]
# Compute q(x_1)q(x_2)
app_mean = np.array([q1_mean,q2_mean])
app_cov_matrix = np.array([[q2_variance,0],
                           [0,q2_variance]])
app_z = np.empty_like(x_mesh)
# Plot approximate distribution with one iteration
for i in range(x.shape[0]):
    for j in range(y.shape[0]):
        app_z[i,j] = multivariate_normal.pdf(x =(x[i],y[j]), 
                                             mean=app_mean,cov=app_cov_matrix)
app_cont = plt.contour(x_mesh,y_mesh,app_z,levels=[0.1],colors='red')
#app_cont.clabel('app_cont')
plt.contour(x_mesh,y_mesh,org_z,levels=[0.1],colors='blue')
I plot original 2-dimentional gaussian distribution in blue, and approximate distribution is plotted in red.
# Class represents approximate distribution
class app_dist:
    def __init__(self):
            # Parameter of function to be approximated
            self.org_mean = np.array([0,0])
            self.org_cov_matrix = np.array([[1,0.8],
                                              [0.8,1]])
            self.q1_mean = 0
            self.q1_var = org_cov_matrix[0,0]
            # The only value you have to give first is mean of q_2(x).
            # Actually, q(x_2) doesn't have to be gaussian distribution,
            # as long as mean of distribution is known you can apply 
            # any distribution.
            self.q2_mean = -4
            self.q2_var = org_cov_matrix[1,1]
            
    def update_q1(self):
        """
        update q_1(x)
        """
        self.q1_mean = self.org_mean[0] - ((1/self.org_cov_matrix[0,0]
                                           ) * self.org_cov_matrix[0,1]
                                          ) * (self.q2_mean - self.org_mean[1])
        self.q1_var = self.org_cov_matrix[0,0]
    def update_q2(self):
        """
        update q_2(x)
        """
        self.q2_mean = self.org_mean[1] - ((1/self.org_cov_matrix[1,1]
                                           ) * self.org_cov_matrix[1,0]
                                          ) * (self.q1_mean - self.org_mean[1])
        self.q2_var = self.org_cov_matrix[1,1]
    def get_app_dist(self):
        """
        Returnb approximate distribution(2-dimentional gaussian distribution)
        """
        app_mean = np.array([self.q1_mean,self.q2_mean])
        app_var = np.array([[self.q1_var,0],
                          [0,self.q2_var]])
        
        return_val = multivariate_normal(mean=app_mean,cov=app_var)
        return return_val
Note :
Actually for initial $q(x_2)$, as long as mean is known, you can apply any distribution you want. From equation $(2)$, what is used to generate $q(x_1)$ from $q(x_2)$ is only mean. Therefore, after first approximation, both $q(x_1)$ and $q(x_2)$ become gaussian distribution.
iteration_list = np.array([1,3,5,10,15,20])
fig = plt.figure(figsize=(13,10))
# Tuple used for plotting
plot_tuple = iteration_list.reshape((-1,2)).shape
for num, iter_num in enumerate(iteration_list):
    # Instantiate approximate function
    app_dist_ins = app_dist()
    
    # Iterate in accordance with 'iter_num'
    for i in range(iter_num):
        app_dist_ins.update_q1()
        app_dist_ins.update_q2()
    # Approximate distribution
    app_multivariate_dist = app_dist_ins.get_app_dist()
    app_z = np.empty_like(x_mesh)
    
    # Plot original distribution as blue and approximate distribution as red
    for i in range(x.shape[0]):
        for j in range(y.shape[0]):
            app_z[i,j] = app_multivariate_dist.pdf(x =(x[i],y[j]))
    
    ax = fig.add_subplot(plot_tuple[0],plot_tuple[1],num+1)
    ax.contour(x_mesh,y_mesh,app_z,levels=[0.1],colors='red')
    ax.contour(x_mesh,y_mesh,org_z,levels=[0.1],colors='blue')
    
    ax.set_title('iteration = {}'.format(iter_num))
Now we can tell that approximate distribution is approching mean of original distribution. However approximation distribution seems not to have correlation. Needless to say, it's due to assumption of independence between two variables.
2. KL-divergence¶
Possibly someone might have interest in KL-divergence between original distribution and approximation distribution. I confirmed in following code!
x = np.linspace(-10,10,101)
y = np.linspace(-10,10,101)
x_mesh,y_mesh = np.meshgrid(x,y)
pos = np.dstack((x_mesh,y_mesh))
# Number of variational approximation
iteration_num = 30
# Numpy which store kl-divergence for every iteration
kl_list = np.empty(iteration_num)
app_dist_ins_kl = app_dist()
for i in range(iteration_num):
    app_dist_ins_kl.update_q1()
    app_dist_ins_kl.update_q2()
    
    # approximation distribution
    app_distribution = app_dist_ins_kl.get_app_dist()
    # Pdf of approximation distribution
    app_pdf = app_distribution.pdf(pos)
    # Pdf of original distribution
    org_pdf = multivariate_normal.pdf(x=pos,mean=
                                      app_dist_ins_kl.org_mean,
                                      cov=app_dist_ins_kl.org_cov_matrix)
    # Compute KL-divergence
    kl_list[i] = entropy(org_pdf.flatten(),app_pdf.flatten())
    
# plot kl-divergence
x=np.arange(iteration_num)
plt.plot(x,kl_list)
plt.title("KL divergence")
Apparently, it is proven that variational approximatoin decrease KL-divergence.
 
No comments:
Post a Comment