Friday, June 29, 2018

Variational Approximation②

This article is continuation from Variational Approximaton①. In this article, I will show the example of variational approximation in 2-dimentional gaussian distribution.

0. Mathmatical analysis

As you may know, the following is 2-dimentional normal distribution.
$$\begin{eqnarray}p(x_1,x_2|\mu_1,\mu_2,\Lambda^{-1}) &=& \frac{1}{\sqrt{2\pi^2|\Lambda^{-1}|}}exp\left\{-\frac{1}{2}\left(\begin{array}{c}x_1 - \mu_1 \\x_2 - \mu_2\end{array}\right)^T\begin{pmatrix}\lambda_{11} & \lambda_{12}\\ \lambda_{21} & \lambda_{22 } \end{pmatrix}\left(\begin{array}{c}x_1 - \mu_1 \\x_2 - \mu_2\end{array}\right)\right\} \\ &\propto& exp\left\{-\frac{1}{2}\left(\begin{array}{c}x_1 - \mu_1 \\x_2 - \mu_2\end{array}\right)^T\begin{pmatrix}\lambda_{11} & \lambda_{12}\\ \lambda_{21} & \lambda_{22 } \end{pmatrix}\left(\begin{array}{c}x_1 - \mu_1 \\x_2 - \mu_2\end{array}\right)\right\}\end{eqnarray}$$

Since "2-dimentional gaussian distibution" has two variable, we can assume $p(x_1,x_2|\mu_1,\mu_2,\Lambda^{-1}) \approx q(x_1,x_2) = q(x_1)q(x_2)$.

Then, we can apply formula which is derived in Variational Approximaton①.

$$\begin{eqnarray}\log q(x_1) &=& <\log p(x_1,x_2)>_q(x_2) +c \\ &=& \left< -\frac{1}{2}\left(\begin{array}{c} x_1-\mu_1 \\ x_2-\mu_2 \end{array} \right)^T\begin{pmatrix} \lambda_{11}& \lambda_{12} \\ \lambda_{21}& \lambda_{22} \end{pmatrix}\left(\begin{array}{c} x_1-\mu_1 \\ x_2-\mu_2 \end{array} \right)\right>_{q(x_2)}\ +\ c \\ &=&\left< -\frac{1}{2}\{\lambda_{11}x_1^2 - 2x_1(\lambda_{11}\mu_1-\lambda_{12}(x_2- \mu_2)\}\right>_{q(x_2)} + c \\ &=& -\frac{1}{2}\{\lambda_{11}x_1^2 - 2x_1(\lambda_{11}\mu_1-\lambda_{12}(<x_2>_{x_2}- \mu_2)\} + c\end{eqnarray}$$


We can tell this is quadratic function with respect to $x_1$ which is concave downward. Therefore, $q(x_1)$ can be denoted as below.
$$q(x_1) = N(x_1|m_1,\Lambda^{-1}_{11})$$ $$such\ that$$ $$m_1 = \mu_1 - \Lambda^{-1}_{11}\Lambda_{12}(<x_2> - \mu_2) \tag{1} $$

Similarly, $q(x_2)$ is captured as following.

$$q(x_2) = N(x_2|m_2,\Lambda^{-1}_{22})$$$$such\ that$$$$m_2 = \mu_2 - \Lambda^{-1}_{22}\Lambda_{21}(<x_1> - \mu_2) \tag{2}$$

1. Implementation

From now on, I'll show you the implementation of variational_approximation of "2-dimentional gaussian distribution". I hope you'll enjoy it :)

In [1]:
from scipy.stats import norm
from scipy.stats import multivariate_normal
from scipy.stats import entropy
import numpy as np
import matplotlib.pyplot as plt
% matplotlib inline

At fist we prepare "2-dimentional gaussian distribution" which will be approximated with variational approximation. This time, mean vector is $\left(\begin{array}{c} 0\\ 0 \end{array} \right)$, co-variance matix is $\begin{pmatrix}1 & 0.8 \\0.8& 1\end{pmatrix}$. We can see how it looks like as below.

In [2]:
x = np.linspace(-4,4,101)
y = np.linspace(-4,4,101)
x_mesh,y_mesh = np.meshgrid(x,y)

org_mean=np.array([0,0])
org_cov_matrix = np.array([[1,0.8],
              [0.8,1]])
org_z = np.empty_like(x_mesh)

for i in range(x.shape[0]):
    for j in range(y.shape[0]):
        org_z[i,j] = multivariate_normal.pdf(x =(x[i],y[j]), mean=org_mean,
                                             cov=org_cov_matrix)
plt.contour(x_mesh,y_mesh,org_z,levels=[0.1],colors='blue')
Out[2]:
<matplotlib.contour.QuadContourSet at 0x10dfa2710>

As we discussed before, mean vector of $q(x_2)$ should be given at first. Tentatively we set '-3' for mean of $q(x_2)$. Before that, we should be on the same page on what $q(x_1)q(x_2)$ looks like. Let's think about two following normal distribution. $$N(x|\mu_1,\sigma^2_1) = \frac{1}{\sqrt{2\pi}\sigma^2_1}exp\left\{-\frac{(x-\mu_1)^2}{2\sigma^2_1}\right\}$$ $$N(x|\mu_2,\sigma^2_2) = \frac{1}{\sqrt{2\pi}\sigma^2_2}exp\left\{-\frac{(x-\mu_2)^2}{2\sigma^2_2}\right\}$$ For stepwise understanding, let's see approximate distibution at one time iteration :)

In [3]:
q2_mean = -4

# First iteration  :)
# Obtain q_1(x)
q1_mean = org_mean[0] - ((1/org_cov_matrix[0,0]) * 
                         org_cov_matrix[0,1]) * (q2_mean - org_mean[1])
q1_variance = org_cov_matrix[0,0]
# Obtain q_2(x)
q2_mean = org_mean[1] - ((1/org_cov_matrix[1,1]) * 
                         org_cov_matrix[1,0]) * (q1_mean - org_mean[0])
q2_variance = org_cov_matrix[1,1]

# Compute q(x_1)q(x_2)
app_mean = np.array([q1_mean,q2_mean])
app_cov_matrix = np.array([[q2_variance,0],
                           [0,q2_variance]])

app_z = np.empty_like(x_mesh)

# Plot approximate distribution with one iteration
for i in range(x.shape[0]):
    for j in range(y.shape[0]):
        app_z[i,j] = multivariate_normal.pdf(x =(x[i],y[j]), 
                                             mean=app_mean,cov=app_cov_matrix)

app_cont = plt.contour(x_mesh,y_mesh,app_z,levels=[0.1],colors='red')
#app_cont.clabel('app_cont')
plt.contour(x_mesh,y_mesh,org_z,levels=[0.1],colors='blue')
Out[3]:
<matplotlib.contour.QuadContourSet at 0x10e1f3438>

I plot original 2-dimentional gaussian distribution in blue, and approximate distribution is plotted in red.

In [4]:
# Class represents approximate distribution
class app_dist:
    def __init__(self):
            # Parameter of function to be approximated
            self.org_mean = np.array([0,0])
            self.org_cov_matrix = np.array([[1,0.8],
                                              [0.8,1]])
            self.q1_mean = 0
            self.q1_var = org_cov_matrix[0,0]
            # The only value you have to give first is mean of q_2(x).
            # Actually, q(x_2) doesn't have to be gaussian distribution,
            # as long as mean of distribution is known you can apply 
            # any distribution.
            self.q2_mean = -4
            self.q2_var = org_cov_matrix[1,1]

            
    def update_q1(self):
        """
        update q_1(x)
        """
        self.q1_mean = self.org_mean[0] - ((1/self.org_cov_matrix[0,0]
                                           ) * self.org_cov_matrix[0,1]
                                          ) * (self.q2_mean - self.org_mean[1])
        self.q1_var = self.org_cov_matrix[0,0]

    def update_q2(self):
        """
        update q_2(x)
        """
        self.q2_mean = self.org_mean[1] - ((1/self.org_cov_matrix[1,1]
                                           ) * self.org_cov_matrix[1,0]
                                          ) * (self.q1_mean - self.org_mean[1])
        self.q2_var = self.org_cov_matrix[1,1]

    def get_app_dist(self):
        """
        Returnb approximate distribution(2-dimentional gaussian distribution)
        """
        app_mean = np.array([self.q1_mean,self.q2_mean])
        app_var = np.array([[self.q1_var,0],
                          [0,self.q2_var]])
        
        return_val = multivariate_normal(mean=app_mean,cov=app_var)
        return return_val

Note :
Actually for initial $q(x_2)$, as long as mean is known, you can apply any distribution you want. From equation $(2)$, what is used to generate $q(x_1)$ from $q(x_2)$ is only mean. Therefore, after first approximation, both $q(x_1)$ and $q(x_2)$ become gaussian distribution.

In [5]:
iteration_list = np.array([1,3,5,10,15,20])
fig = plt.figure(figsize=(13,10))
# Tuple used for plotting
plot_tuple = iteration_list.reshape((-1,2)).shape

for num, iter_num in enumerate(iteration_list):
    # Instantiate approximate function
    app_dist_ins = app_dist()
    
    # Iterate in accordance with 'iter_num'
    for i in range(iter_num):
        app_dist_ins.update_q1()
        app_dist_ins.update_q2()

    # Approximate distribution
    app_multivariate_dist = app_dist_ins.get_app_dist()

    app_z = np.empty_like(x_mesh)
    
    # Plot original distribution as blue and approximate distribution as red
    for i in range(x.shape[0]):
        for j in range(y.shape[0]):
            app_z[i,j] = app_multivariate_dist.pdf(x =(x[i],y[j]))
    
    ax = fig.add_subplot(plot_tuple[0],plot_tuple[1],num+1)
    ax.contour(x_mesh,y_mesh,app_z,levels=[0.1],colors='red')
    ax.contour(x_mesh,y_mesh,org_z,levels=[0.1],colors='blue')
    
    ax.set_title('iteration = {}'.format(iter_num))

Now we can tell that approximate distribution is approching mean of original distribution. However approximation distribution seems not to have correlation. Needless to say, it's due to assumption of independence between two variables.

2. KL-divergence

Possibly someone might have interest in KL-divergence between original distribution and approximation distribution. I confirmed in following code!

In [6]:
x = np.linspace(-10,10,101)
y = np.linspace(-10,10,101)
x_mesh,y_mesh = np.meshgrid(x,y)
pos = np.dstack((x_mesh,y_mesh))

# Number of variational approximation
iteration_num = 30
# Numpy which store kl-divergence for every iteration
kl_list = np.empty(iteration_num)

app_dist_ins_kl = app_dist()

for i in range(iteration_num):
    app_dist_ins_kl.update_q1()
    app_dist_ins_kl.update_q2()
    
    # approximation distribution
    app_distribution = app_dist_ins_kl.get_app_dist()

    # Pdf of approximation distribution
    app_pdf = app_distribution.pdf(pos)
    # Pdf of original distribution
    org_pdf = multivariate_normal.pdf(x=pos,mean=
                                      app_dist_ins_kl.org_mean,
                                      cov=app_dist_ins_kl.org_cov_matrix)

    # Compute KL-divergence
    kl_list[i] = entropy(org_pdf.flatten(),app_pdf.flatten())
    
# plot kl-divergence
x=np.arange(iteration_num)
plt.plot(x,kl_list)
plt.title("KL divergence")
Out[6]:
<matplotlib.text.Text at 0x10efea828>

Apparently, it is proven that variational approximatoin decrease KL-divergence.

No comments:

Post a Comment