Saturday, June 30, 2018

Rotation of ellipse

Today I will share with you about rotation of ellipse.The reason I've chosen this topic is that I frequently face it when I work on statistic, for instance, the contour of 2-dimentional gaussian distribution or something.

0. Ellipse without rotation

Before getting into rotation of ellipse. We should be on the same page about ordinal ellipse. Let us think about the ellipse which is denoted in the following equation. $$\left(\begin{array}{c} x_1 \\ x_2 \end{array} \right)^T \begin{pmatrix} 4 & 0 \\ 0 & 9\end{pmatrix}\left(\begin{array}{c} x_1 \\ x_2 \end{array} \right) = 1$$
This can be expressed as below. $$9x_1^2 + 4x_2^2 = 1$$
Therfore this is the ellipse whose semi-major axis is $\frac{1}{2}$, semi-major is $\frac{1}{3}$.

1. Rotation of ellipse

Then what I wanna work on in this article is the case that the non diagonal element is not ZERO. Following is one of the example. $$\left(\begin{array}{c} x_1 \\ x_2 \end{array} \right)^T \begin{pmatrix} 4 & 0.5 \\ 0.5 & 9\end{pmatrix}\left(\begin{array}{c} x_1 \\ x_2 \end{array} \right) = 1$$
Actually it's rotated ellipse. And it's a little bit complicated than ellipse without rotation. Let's say these equation can be denoted as following. $$\vec{x}^T A \vec{x} = 1\tag{1}$$
Then, if nondiagonal elements is ZERO, we can see it as ordinal ellipse. Therefore let's think about diagonalization.
Let us assume eigen vector of $A$ is $\vec{t_1}$, $\vec{t_2}$, $T$ is the matrix which consists of eigen vectors of $A$, $T = (\vec{t_1},\vec{t_2})$. Then matrix $A$ can be diagonalized by matrix $T$. (Since matrix $T$ is orthogonal matrix, $T = T^{T}$) $$T^{-1}AT = T^{T}AT = \begin{pmatrix}\lambda_1 & 0 \\ 0 & \lambda_2\end{pmatrix}$$

Note :
As you may know, necessary and sufficient conditoins for matrix to be diagnonalizable is having $n$ number of eigen vectors which is linearly independent. In this case, we assume matrix $A$ to be symmentric matrix, thus $A$ can be diagonalized by Orthogonal matrix.

Suppose $\begin{pmatrix}\lambda_1 & 0 \\ 0 & \lambda_2\end{pmatrix} =P$,
$$A = TPT^T$$ Substitute $P$ into (1),
$$\begin{eqnarray}x^TTPT^{-1}x &=& 1 \\ (T^Tx)^TP(T^Tx) &=&1\end{eqnarray}$$
From above equation, we can tell that rotated ellipse would be the one whose semi-major axis is $\frac{1}{\lambda_1}$, semi-major axisis $\frac{1}{\lambda_2}$ on new axes which defined by $T^Tx$. Or we can say linear transformed ellipse whose semi-major axis is $\frac{1}{\lambda_1}$, semi-major axisis $\frac{1}{\lambda_2}$ by $T^T$:)

Friday, June 29, 2018

Variational Approximation②

This article is continuation from Variational Approximaton①. In this article, I will show the example of variational approximation in 2-dimentional gaussian distribution.

0. Mathmatical analysis

As you may know, the following is 2-dimentional normal distribution.
$$\begin{eqnarray}p(x_1,x_2|\mu_1,\mu_2,\Lambda^{-1}) &=& \frac{1}{\sqrt{2\pi^2|\Lambda^{-1}|}}exp\left\{-\frac{1}{2}\left(\begin{array}{c}x_1 - \mu_1 \\x_2 - \mu_2\end{array}\right)^T\begin{pmatrix}\lambda_{11} & \lambda_{12}\\ \lambda_{21} & \lambda_{22 } \end{pmatrix}\left(\begin{array}{c}x_1 - \mu_1 \\x_2 - \mu_2\end{array}\right)\right\} \\ &\propto& exp\left\{-\frac{1}{2}\left(\begin{array}{c}x_1 - \mu_1 \\x_2 - \mu_2\end{array}\right)^T\begin{pmatrix}\lambda_{11} & \lambda_{12}\\ \lambda_{21} & \lambda_{22 } \end{pmatrix}\left(\begin{array}{c}x_1 - \mu_1 \\x_2 - \mu_2\end{array}\right)\right\}\end{eqnarray}$$

Since "2-dimentional gaussian distibution" has two variable, we can assume $p(x_1,x_2|\mu_1,\mu_2,\Lambda^{-1}) \approx q(x_1,x_2) = q(x_1)q(x_2)$.

Then, we can apply formula which is derived in Variational Approximaton①.

$$\begin{eqnarray}\log q(x_1) &=& <\log p(x_1,x_2)>_q(x_2) +c \\ &=& \left< -\frac{1}{2}\left(\begin{array}{c} x_1-\mu_1 \\ x_2-\mu_2 \end{array} \right)^T\begin{pmatrix} \lambda_{11}& \lambda_{12} \\ \lambda_{21}& \lambda_{22} \end{pmatrix}\left(\begin{array}{c} x_1-\mu_1 \\ x_2-\mu_2 \end{array} \right)\right>_{q(x_2)}\ +\ c \\ &=&\left< -\frac{1}{2}\{\lambda_{11}x_1^2 - 2x_1(\lambda_{11}\mu_1-\lambda_{12}(x_2- \mu_2)\}\right>_{q(x_2)} + c \\ &=& -\frac{1}{2}\{\lambda_{11}x_1^2 - 2x_1(\lambda_{11}\mu_1-\lambda_{12}(<x_2>_{x_2}- \mu_2)\} + c\end{eqnarray}$$


We can tell this is quadratic function with respect to $x_1$ which is concave downward. Therefore, $q(x_1)$ can be denoted as below.
$$q(x_1) = N(x_1|m_1,\Lambda^{-1}_{11})$$ $$such\ that$$ $$m_1 = \mu_1 - \Lambda^{-1}_{11}\Lambda_{12}(<x_2> - \mu_2) \tag{1} $$

Similarly, $q(x_2)$ is captured as following.

$$q(x_2) = N(x_2|m_2,\Lambda^{-1}_{22})$$$$such\ that$$$$m_2 = \mu_2 - \Lambda^{-1}_{22}\Lambda_{21}(<x_1> - \mu_2) \tag{2}$$

1. Implementation

From now on, I'll show you the implementation of variational_approximation of "2-dimentional gaussian distribution". I hope you'll enjoy it :)

In [1]:
from scipy.stats import norm
from scipy.stats import multivariate_normal
from scipy.stats import entropy
import numpy as np
import matplotlib.pyplot as plt
% matplotlib inline

At fist we prepare "2-dimentional gaussian distribution" which will be approximated with variational approximation. This time, mean vector is $\left(\begin{array}{c} 0\\ 0 \end{array} \right)$, co-variance matix is $\begin{pmatrix}1 & 0.8 \\0.8& 1\end{pmatrix}$. We can see how it looks like as below.

In [2]:
x = np.linspace(-4,4,101)
y = np.linspace(-4,4,101)
x_mesh,y_mesh = np.meshgrid(x,y)

org_mean=np.array([0,0])
org_cov_matrix = np.array([[1,0.8],
              [0.8,1]])
org_z = np.empty_like(x_mesh)

for i in range(x.shape[0]):
    for j in range(y.shape[0]):
        org_z[i,j] = multivariate_normal.pdf(x =(x[i],y[j]), mean=org_mean,
                                             cov=org_cov_matrix)
plt.contour(x_mesh,y_mesh,org_z,levels=[0.1],colors='blue')
Out[2]:
<matplotlib.contour.QuadContourSet at 0x10dfa2710>

As we discussed before, mean vector of $q(x_2)$ should be given at first. Tentatively we set '-3' for mean of $q(x_2)$. Before that, we should be on the same page on what $q(x_1)q(x_2)$ looks like. Let's think about two following normal distribution. $$N(x|\mu_1,\sigma^2_1) = \frac{1}{\sqrt{2\pi}\sigma^2_1}exp\left\{-\frac{(x-\mu_1)^2}{2\sigma^2_1}\right\}$$ $$N(x|\mu_2,\sigma^2_2) = \frac{1}{\sqrt{2\pi}\sigma^2_2}exp\left\{-\frac{(x-\mu_2)^2}{2\sigma^2_2}\right\}$$ For stepwise understanding, let's see approximate distibution at one time iteration :)

In [3]:
q2_mean = -4

# First iteration  :)
# Obtain q_1(x)
q1_mean = org_mean[0] - ((1/org_cov_matrix[0,0]) * 
                         org_cov_matrix[0,1]) * (q2_mean - org_mean[1])
q1_variance = org_cov_matrix[0,0]
# Obtain q_2(x)
q2_mean = org_mean[1] - ((1/org_cov_matrix[1,1]) * 
                         org_cov_matrix[1,0]) * (q1_mean - org_mean[0])
q2_variance = org_cov_matrix[1,1]

# Compute q(x_1)q(x_2)
app_mean = np.array([q1_mean,q2_mean])
app_cov_matrix = np.array([[q2_variance,0],
                           [0,q2_variance]])

app_z = np.empty_like(x_mesh)

# Plot approximate distribution with one iteration
for i in range(x.shape[0]):
    for j in range(y.shape[0]):
        app_z[i,j] = multivariate_normal.pdf(x =(x[i],y[j]), 
                                             mean=app_mean,cov=app_cov_matrix)

app_cont = plt.contour(x_mesh,y_mesh,app_z,levels=[0.1],colors='red')
#app_cont.clabel('app_cont')
plt.contour(x_mesh,y_mesh,org_z,levels=[0.1],colors='blue')
Out[3]:
<matplotlib.contour.QuadContourSet at 0x10e1f3438>

I plot original 2-dimentional gaussian distribution in blue, and approximate distribution is plotted in red.

In [4]:
# Class represents approximate distribution
class app_dist:
    def __init__(self):
            # Parameter of function to be approximated
            self.org_mean = np.array([0,0])
            self.org_cov_matrix = np.array([[1,0.8],
                                              [0.8,1]])
            self.q1_mean = 0
            self.q1_var = org_cov_matrix[0,0]
            # The only value you have to give first is mean of q_2(x).
            # Actually, q(x_2) doesn't have to be gaussian distribution,
            # as long as mean of distribution is known you can apply 
            # any distribution.
            self.q2_mean = -4
            self.q2_var = org_cov_matrix[1,1]

            
    def update_q1(self):
        """
        update q_1(x)
        """
        self.q1_mean = self.org_mean[0] - ((1/self.org_cov_matrix[0,0]
                                           ) * self.org_cov_matrix[0,1]
                                          ) * (self.q2_mean - self.org_mean[1])
        self.q1_var = self.org_cov_matrix[0,0]

    def update_q2(self):
        """
        update q_2(x)
        """
        self.q2_mean = self.org_mean[1] - ((1/self.org_cov_matrix[1,1]
                                           ) * self.org_cov_matrix[1,0]
                                          ) * (self.q1_mean - self.org_mean[1])
        self.q2_var = self.org_cov_matrix[1,1]

    def get_app_dist(self):
        """
        Returnb approximate distribution(2-dimentional gaussian distribution)
        """
        app_mean = np.array([self.q1_mean,self.q2_mean])
        app_var = np.array([[self.q1_var,0],
                          [0,self.q2_var]])
        
        return_val = multivariate_normal(mean=app_mean,cov=app_var)
        return return_val

Note :
Actually for initial $q(x_2)$, as long as mean is known, you can apply any distribution you want. From equation $(2)$, what is used to generate $q(x_1)$ from $q(x_2)$ is only mean. Therefore, after first approximation, both $q(x_1)$ and $q(x_2)$ become gaussian distribution.

In [5]:
iteration_list = np.array([1,3,5,10,15,20])
fig = plt.figure(figsize=(13,10))
# Tuple used for plotting
plot_tuple = iteration_list.reshape((-1,2)).shape

for num, iter_num in enumerate(iteration_list):
    # Instantiate approximate function
    app_dist_ins = app_dist()
    
    # Iterate in accordance with 'iter_num'
    for i in range(iter_num):
        app_dist_ins.update_q1()
        app_dist_ins.update_q2()

    # Approximate distribution
    app_multivariate_dist = app_dist_ins.get_app_dist()

    app_z = np.empty_like(x_mesh)
    
    # Plot original distribution as blue and approximate distribution as red
    for i in range(x.shape[0]):
        for j in range(y.shape[0]):
            app_z[i,j] = app_multivariate_dist.pdf(x =(x[i],y[j]))
    
    ax = fig.add_subplot(plot_tuple[0],plot_tuple[1],num+1)
    ax.contour(x_mesh,y_mesh,app_z,levels=[0.1],colors='red')
    ax.contour(x_mesh,y_mesh,org_z,levels=[0.1],colors='blue')
    
    ax.set_title('iteration = {}'.format(iter_num))

Now we can tell that approximate distribution is approching mean of original distribution. However approximation distribution seems not to have correlation. Needless to say, it's due to assumption of independence between two variables.

2. KL-divergence

Possibly someone might have interest in KL-divergence between original distribution and approximation distribution. I confirmed in following code!

In [6]:
x = np.linspace(-10,10,101)
y = np.linspace(-10,10,101)
x_mesh,y_mesh = np.meshgrid(x,y)
pos = np.dstack((x_mesh,y_mesh))

# Number of variational approximation
iteration_num = 30
# Numpy which store kl-divergence for every iteration
kl_list = np.empty(iteration_num)

app_dist_ins_kl = app_dist()

for i in range(iteration_num):
    app_dist_ins_kl.update_q1()
    app_dist_ins_kl.update_q2()
    
    # approximation distribution
    app_distribution = app_dist_ins_kl.get_app_dist()

    # Pdf of approximation distribution
    app_pdf = app_distribution.pdf(pos)
    # Pdf of original distribution
    org_pdf = multivariate_normal.pdf(x=pos,mean=
                                      app_dist_ins_kl.org_mean,
                                      cov=app_dist_ins_kl.org_cov_matrix)

    # Compute KL-divergence
    kl_list[i] = entropy(org_pdf.flatten(),app_pdf.flatten())
    
# plot kl-divergence
x=np.arange(iteration_num)
plt.plot(x,kl_list)
plt.title("KL divergence")
Out[6]:
<matplotlib.text.Text at 0x10efea828>

Apparently, it is proven that variational approximatoin decrease KL-divergence.