Saturday, May 26, 2018

Mutual Information

Mutual Information

In this article, I will share with you regarding "Mutual Information". I believe this is one of crucial concept when working on data science, machine learning, deep learning and list goes on and on. I would be glad if you enjoy this article :)

1. Conditional Entropy

"What is Entropy" was discussed in the following link.
https://hiroshiu.blogspot.com/2018/04/what-is-entropy.html
Before getting into the "Mutual Information", we have to wrap our head around "Conditional Entropy".
"Conditional Entropy" $H(x)$ of two discrete random variables $X(x_1,x_2,\cdots,x_n), Y(y_1,y_2,\cdots,y_m)$ is captured in the followings $$H(X|y_{1}) = -\sum_{i=1}^{n}P(x_{i}|y_{1})\log(P(x_{i}|y_{1}))$$
Therefore,
\begin{eqnarray} H(X|Y) &=& \sum_{j=1}^{m}P(y_{j})\sum_{i=1}^{n}P(X_{i}|y_{j})\log({P(X_{i}|y_{j})})\\ &=&\sum_{j=1}^{m}\sum_{i=1}^{n}P(x_{i} \cap y_{j})\log(\frac{P(x_{i}\cap y_{j})}{P(y_{j})}) \end{eqnarray}

2. Mutual Information

Mutual Information is $H(X) - H(X|Y)$, which measures how much knowing one of vaiables reduce uncertainty of others. $$I(X;Y) = H(X) - H(X|Y) = \sum_{y\in{Y}}\sum_{x\in{X}}P(X \cap Y)\log(\frac{P(X \cap Y)}{P(X)P(Y)})$$

3. Implementation

There is useful library of scikit-learn. From now on, I'm trying to calculate mutual information with that library.

In [27]:
from sklearn import datasets
from sklearn.feature_selection import SelectKBest
from sklearn.feature_selection import mutual_info_classif
import numpy as np
In [41]:
iris_dataset = datasets.load_iris()
iris_data = iris_dataset.data
iris_label = iris_dataset.target
In [13]:
# Expranatory variable
iris_data[0:3,:]
Out[13]:
array([[ 5.1,  3.5,  1.4,  0.2],
       [ 4.9,  3. ,  1.4,  0.2],
       [ 4.7,  3.2,  1.3,  0.2]])
In [42]:
# Responsible variable
iris_label[0:3]
Out[42]:
array([0, 0, 0])

You can check value of mutual information with "mutual_info_classif" function.
2th and 3th expranatory variable seems to have higher value than others.

In [40]:
mutual_info_classif(X=iris_data,y=iris_label)
Out[40]:
array([ 0.48958131,  0.24431716,  0.98399648,  1.00119776])

Now you can obtain new expranatory variable which consists of high mutual information. Here I'm gonna extract 2 expranatory variable out of 4 with "SelectKBest" function.

In [21]:
selecter = SelectKBest(score_func=mutual_info_classif, k=2)
selecter_iris = selecter.fit(iris_data,iris_label)
In [45]:
new_iris_data = selecter_iris.fit_transform(iris_data,iris_label)
In [48]:
print('shape of new_iris_data',new_iris_data.shape)
new_iris_data[0:3,:]
shape of new_iris_data (150, 2)
Out[48]:
array([[ 1.4,  0.2],
       [ 1.4,  0.2],
       [ 1.3,  0.2]])

Now I can see explanatory variable with high mutual information were extracted :)
You can also check which explanatory variable is selected as True-False numpy array by invoking "get_support()" method.

In [49]:
support = selecter_iris.get_support()
print('support', support)
np.where(support == True)
support [False False  True  True]
Out[49]:
(array([2, 3]),)

You can check another "Select~" method here.
http://scikit-learn.org/stable/modules/classes.html#module-sklearn.feature_selection
According the link above, for continuous variable, "mutual_info_regression" seems to be preferable.

No comments:

Post a Comment