https://github.com/hiroshiu12/mathematics/blob/master/kl_divergence.ipynb
KL(Kullback-Leibler) divergence¶
1. What is KL(Kullback-Leibler) divergence ?¶
"KL(Kullback-Leibler) divergence" is a way of comparing two different probability distribution. When I work on probability and statistic as a datascientist , in many cases, I'm force to approximate complex distribution or replace observed data. In the situation, we can measure how much we loose information by approximation with "KL(Kullback-Leibler) divergence".
Let us assume that there are two probability distribution $p(x)$, $q(x)$. "KL(Kullback-Leibler) divergence" is dnoted by following equation. $$\begin{eqnarray} KL[q(x)][p(x)] &=& -\int q(x)\log \frac{p(x)}{q(x)} \\ &=& \int q(x)\log q(x) dx - \int q(x)\log p(x) dx \\ &=& <\log q(x)>_{q(x)} - <\log p(x)>_{q(x)} \end{eqnarray}$$
At a glance, "KL(Kullback-Leibler) divergence" looks distance metric between two probability distribution. However, strictly speaking, it doesn't meet distance axiom "Symmetric"), since generally $KL[q(x)][p(x)] \neq KL[p(x)][q(x)]$. It's divergence not distance.
2. Compute divergence¶
Let's take a look at computing "KL(Kullback-Leibler) divergence".
This time I'm gonna compute two different normal distribution, one has mean of 0 and standard variation of 1, the other has mean of 0 and standard variation of 2.
import numpy as np
from scipy.stats import norm,entropy
import matplotlib.pyplot as plt
% matplotlib inline
# This time, I'm gonna compute divergence of following.
# ・ normal distribution with mean =0 and standard deviation = 1
# ・ normal distribution with mean =0 and standard deviation = 2
x = np.linspace(-5.0,5.0,100)
px_p = norm.pdf(x,0,1)
px_q = norm.pdf(x,0,2)
# Plot two different normal distribution
plt.plot(x,px_p, label = "mean=0\nstandard deviation=1")
plt.plot(x,px_q, label = "mean=0\nstandard deviation=2")
plt.legend()
plt.title('Two different normal distribution')
# Compute KL(Kullback-Leibler) divergence
kl_divergence = entropy(px_q,px_p)
print('KL(Kullback-Leibler) :',kl_divergence)
As you can see, scipy.stats.entropy() can be used to compute "KL(Kullback-Leibler) divergence". With one argument, it's to return entropy. Whereas with two arguments, it's supposed to return "KL(Kullback-Leibler) divergence". It's quite useful :)
You can get nice TeX brackets < and > with \langle and \rangle.
ReplyDelete