Wednesday, June 6, 2018

Standardization in statistic

"Standardization" in statistic

When I work on machine learning, "standardization" can be one of slick technic. Hence I will share with you here.

0. What is standardization??

Standardization is the precess to putting different variable on the same scale. Concretely speaking, standardization interprete mean and variance of observed value into 0 and 1 respectively. This process allows you to compare values between different type of variables. The scores produced by standardization represents the number of standard deviation above or below the mean that specific observation falls. For instance, standardized value of 1 indicates that observed value falls 1 standard deviation above the mean :)

1. How to get standardized value??

You can apply standardization with comparative ease as following.

  1. Compute mean and variance of observed value.
  2. Subtract mean from each observed value and divide by the "standard deviation".

2. Mathmatical analysis

Let $x$ be observed value, $z$ be standardized value.
$$z = ax + b\ \ where\ \ a,b\ are\ constant$$ Mean of $z$ can be presented as below, $$E(z) = E(ax+b)$$ Because of Linearity of expected value, ( Just in case you are not familier with properties of expected value you can wrap your head around here. )
$$\begin{eqnarray}E(z) &=& aE(x) + E(b)\\ &=& a \mu + b \end{eqnarray}$$ Because of standizatin, mean of $z$ should be 0. Hence,
$$a\mu + b = 0\tag{1}$$ Whereas variance is dipicted as bellow, $$\begin{eqnarray}Var(z) &=& E\left\{ (z-E(z))^2\right\}\\ &=& E\left\{ (ax+b-a\mu-b)^2\right\}\\ &=& E\left\{ (ax-a\mu)^2\right\}\\ &=& a^2E\left\{ (x-\mu)^2\right\}\\ &=& a^2Var(x)\\ &=& a^2\sigma^2\\ \end{eqnarray}$$
Due to standardization, variance of $z$ should be 1. Hence, $$a = \frac{1}{\sigma}$$ Substitute this consequence into (1), $$b = -\frac{\mu}{\sigma}$$ Therefore, $$z = \frac{x}{\sigma} - \frac{\mu}{\sigma} = \frac{x-\mu}{\sigma}$$ Now you can tell why procesure of "1.How to get standardized value?? " should be done in order to get standardized value :)

No comments:

Post a Comment