Mutual Information¶

In this article, I will share with you regarding "Mutual Information". I believe this is one of crucial concept when working on data science, machine learning, deep learning and list goes on and on. I would be glad if you enjoy this article :)

1. Conditional Entropy¶

"What is Entropy" was discussed in the following link.
https://hiroshiu.blogspot.com/2018/04/what-is-entropy.html
Before getting into the "Mutual Information", we have to wrap our head around "Conditional Entropy".
"Conditional Entropy" $H(x)$ of two discrete random variables $X(x_1,x_2,\cdots,x_n), Y(y_1,y_2,\cdots,y_m)$ is captured in the followings $H(X|y_{1}) = -\sum_{i=1}^{n}P(x_{i}|y_{1})\log(P(x_{i}|y_{1}))$
Therefore,
$\begin{eqnarray} H(X|Y) &=& \sum_{j=1}^{m}P(y_{j})\sum_{i=1}^{n}P(X_{i}|y_{j})\log({P(X_{i}|y_{j})})\\ &=&\sum_{j=1}^{m}\sum_{i=1}^{n}P(x_{i} \cap y_{j})\log(\frac{P(x_{i}\cap y_{j})}{P(y_{j})}) \end{eqnarray}$

2. Mutual Information¶

Mutual Information is $H(X) - H(X|Y)$ , which measures how much knowing one of vaiables reduce uncertainty of others. $I(X;Y) = H(X) - H(X|Y) = \sum_{y\in{Y}}\sum_{x\in{X}}P(X \cap Y)\log(\frac{P(X \cap Y)}{P(X)P(Y)})$

3. Implementation¶

There is useful library of scikit-learn. From now on, I'm trying to calculate mutual information with that library.

In [27]:

from sklearn import datasets
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import mutual_info_classif
import numpy as np

In [41]:

iris_dataset = datasets.load_iris()
iris_data = iris_dataset.data
iris_label = iris_dataset.target

In [13]:

# Expranatory variable
iris_data[0:3,:]

Out[13]:

array([[ 5.1,  3.5,  1.4,  0.2],
       [ 4.9,  3. ,  1.4,  0.2],
       [ 4.7,  3.2,  1.3,  0.2]])

In [42]:

# Responsible variable
iris_label[0:3]

Out[42]:

array([0, 0, 0])

You can check value of mutual information with "mutual_info_classif" function.
2th and 3th expranatory variable seems to have higher value than others.

In [40]:

mutual_info_classif(X=iris_data,y=iris_label)

Out[40]:

array([ 0.48958131,  0.24431716,  0.98399648,  1.00119776])

Now you can obtain new expranatory variable which consists of high mutual information. Here I'm gonna extract 2 expranatory variable out of 4 with "SelectKBest" function.

In [21]:

selecter = SelectKBest(score_func=mutual_info_classif, k=2)
selecter_iris = selecter.fit(iris_data,iris_label)

In [45]:

new_iris_data = selecter_iris.fit_transform(iris_data,iris_label)

In [48]:

print('shape of new_iris_data',new_iris_data.shape)
new_iris_data[0:3,:]

shape of new_iris_data (150, 2)

Out[48]:

array([[ 1.4,  0.2],
       [ 1.4,  0.2],
       [ 1.3,  0.2]])

Now I can see explanatory variable with high mutual information were extracted :)
You can also check which explanatory variable is selected as True-False numpy array by invoking "get_support()" method.

In [49]:

support = selecter_iris.get_support()
print('support', support)
np.where(support == True)

support [False False  True  True]

Out[49]:

(array([2, 3]),)

You can check another "Select~" method here.
http://scikit-learn.org/stable/modules/classes.html#module-sklearn.feature_selection
According the link above, for continuous variable, "mutual_info_regression" seems to be preferable.

Tech Notes

Saturday, May 26, 2018