Tuesday, July 31, 2018

Bayesian inference (Poisson distribution) ①

I will share with you Bayesian machine learning approach to poisson distribution. In this article, I will deal with updating posterior distribution. I will update an article about predictive model very soon :)

0. What is the Poisson distribution

Poisson distribution is one of the well-known useful discrete probability distribution which takes form,

$$Poi(x|\lambda) = \frac{\lambda^{x}}{x!}e^{-\lambda}$$

As an example, this distribution can be used to capture the number of customers in restauarant within a given certain period. If there are 3 customers per hour. This situation can be captured as $\lambda$ is 3. $$Poi(x|3) = \frac{3^{x}}{x!}e^{3}$$

Then if we'd like to know what is the probability of 5 customers in next an hour, we can get from poisson distribution.

In [1]:
from scipy.stats import poisson
print('{0:.2f}'.format(poisson.pmf(5,3)))
0.10

From above, the probability of 5 customes coming is 10%. It seems you might as well prepare for 5 cumstomer to come. Poisson distribution can be used in such way.
One of the remarkable property of poisson distribution is that mean and variance is $\lambda$. The followings are the various poisson distribution with different value of $\lambda$.

In [2]:
import numpy as np
import matplotlib.pyplot as plt
% matplotlib inline
In [3]:
# List of parameters of  poisson distribution
lambda_list = [1,2,3,4,5,6]
# data for x axis
x = np.linspace(0,20,21)

fig = plt.figure(figsize=(13,8))

for i, lam in enumerate(lambda_list):
    # Compute probability of each random variables
    y = [poisson.pmf(xdata,lam) for xdata in x]
    # Plot distribution with different lambda
    ax = fig.add_subplot(2,3,i+1)
    ax.bar(x,y)
    ax.set_title("Poisson distribution lambda = {}".format(lam),
                     fontsize=14)
    ax.set_ylim((0,0.4))

plt.show()

1. Posterior distribution

In above section, I explain somehow emprical bayes approach with example of customer in restaurants. Nonethless, our motive in this article is bayesian inference. Thus from now on, we're gonna work on bayesian inference approach. Since gamma distribution is conjugate prior for the likelihood function (poisson distribution), gamma distribution is adopted as a prior distribution.

$$Gam(\lambda| a,b) = C_G(a,b)\lambda^{a-1}e^{-b\lambda}$$

If logarithm is taken on gamma distribution, it will look like,

$$log(Gam(\lambda| a,b)) = (a-1)log(\lambda) -b\lambda + const \tag{1}$$

Suppose $X$ is observed data ${x_1,x_2 \cdots x_n}$, let's compute posterior distribution.

$$\begin{eqnarray}p(\lambda|X) &=& \frac{p(X|\lambda)p(\lambda)}{p(X)}\\ &\propto& p(X|\lambda)p(\lambda)\end{eqnarray}$$


$$p(X|\lambda)p(\lambda) = \prod^N_n Poi(x_n|\lambda)p(\lambda)$$

$$\begin{eqnarray}\log p(X|\lambda)p(\lambda) = \left(\sum^N_nx_n + a-1\right)\log\lambda - (N+b)\lambda + const\end{eqnarray}$$

Compared to equation $(1)$, posterior distribution can be depicted as below.
$$Gam(\lambda|\hat{a},\hat{b})\ \ \ where\ \ \hat{a} = \sum^N_nx_n +a, \ \ \hat{b} = N + b$$

Let's see how gamma distribution is updated. Tentatively, hyper parameter of gamma distribution are set to $a=3$, $b=0.2$.

In [4]:
from scipy.stats import gamma
In [5]:
x = np.linspace(1,40,101)
y = gamma.pdf(x,a=3,scale=1/0.2)

plt.plot(x,y)
plt.title('prior distribution',fontsize = 15)
plt.show()

Then set function to computer updated parameter or parameter of posterior distribution.

In [6]:
def train_gamma(data, hyper_a=3, hyper_b=0.2):
    """
    Return parameter of posterior distribution
    """
    hat_a = data.sum() + hyper_a
    hat_b = data.shape[0] + hyper_b
    
    return hat_a, hat_b

Suppose we observe some data from poisson distribution whose parameter is $\lambda = 20$.

In [7]:
# number of observatoin
obs_num = [1,3,5,10]
fig = plt.figure(figsize=(13,8))

for i, num in enumerate(obs_num):
    # obtain observed data
    obs_data = poisson.rvs(20,size=num)
    # update posterior distribution
    hat_a, hat_b = train_gamma(obs_data)
    # get probability from updated posterior distribution
    y = gamma.pdf(x,a=hat_a,scale=1/hat_b)
    # plot posterior distribution
    ax = fig.add_subplot(2,2,i+1)
    ax.plot(x,y)
    ax.set_xlim((0,40))
    ax.set_ylim((0,0.30))
    ax.set_title('posterior distribution with {} observation'.format(num),
                fontsize=14)

Now you can tell that posterior distribution is approaching 20 which is mean of obeserved data :)