Saturday, May 5, 2018

What is KL(Kullback-Leibler) divergence??

This article is regarding KL(Kullback-Leibler) divergence. You can check original jupyter notebook source in the follwing link.
https://github.com/hiroshiu12/mathematics/blob/master/kl_divergence.ipynb

KL(Kullback-Leibler) divergence

1. What is KL(Kullback-Leibler) divergence ?

"KL(Kullback-Leibler) divergence" is a way of comparing two different probability distribution. When I work on probability and statistic as a datascientist , in many cases, I'm force to approximate complex distribution or replace observed data. In the situation, we can measure how much we loose information by approximation with "KL(Kullback-Leibler) divergence".

Let us assume that there are two probability distribution $p(x)$, $q(x)$. "KL(Kullback-Leibler) divergence" is dnoted by following equation. $$\begin{eqnarray} KL[q(x)][p(x)] &=& -\int q(x)\log \frac{p(x)}{q(x)} \\ &=& \int q(x)\log q(x) dx - \int q(x)\log p(x) dx \\ &=& <\log q(x)>_{q(x)} - <\log p(x)>_{q(x)} \end{eqnarray}$$

At a glance, "KL(Kullback-Leibler) divergence" looks distance metric between two probability distribution. However, strictly speaking, it doesn't meet distance axiom "Symmetric"), since generally $KL[q(x)][p(x)] \neq KL[p(x)][q(x)]$. It's divergence not distance.

2. Compute divergence

Let's take a look at computing "KL(Kullback-Leibler) divergence".
This time I'm gonna compute two different normal distribution, one has mean of 0 and standard variation of 1, the other has mean of 0 and standard variation of 2.

In [1]:
import numpy as np
from scipy.stats import norm,entropy
import matplotlib.pyplot as plt
% matplotlib inline
In [2]:
# This time, I'm gonna compute divergence of following.
#   ・ normal distribution with mean =0 and standard deviation = 1
#   ・ normal distribution with mean =0 and standard deviation = 2
x = np.linspace(-5.0,5.0,100)
px_p = norm.pdf(x,0,1)
px_q = norm.pdf(x,0,2)

# Plot two different normal distribution 
plt.plot(x,px_p, label = "mean=0\nstandard deviation=1")
plt.plot(x,px_q, label = "mean=0\nstandard deviation=2")
plt.legend()
plt.title('Two different normal distribution')

# Compute KL(Kullback-Leibler) divergence
kl_divergence = entropy(px_q,px_p)
In [3]:
print('KL(Kullback-Leibler) :',kl_divergence)
KL(Kullback-Leibler) : 0.69243465135

As you can see, scipy.stats.entropy() can be used to compute "KL(Kullback-Leibler) divergence". With one argument, it's to return entropy. Whereas with two arguments, it's supposed to return "KL(Kullback-Leibler) divergence". It's quite useful :)

1 comment:

  1. You can get nice TeX brackets < and > with \langle and \rangle.

    ReplyDelete