Fundamental understanding of "Logistic Regression".¶

I've just looked back on "Logistic Regression". This is somewhat of memo for the time being:)

1. Basic logic behind the scene of "logistic regression".¶

In this article I will deal with only binary classification which is $C_1$, and $C_2$.
First of all, let us think about $p(C_1 | x)$,
$$\begin{eqnarray}p(C_1|x) &=& \frac{p(x|C_1)p(C_1)}{p(x)}\\ &=& \frac{p(x|C_1)p(C_1)}{p(x,C_1) + p(x, C_2)}\\ &=& \frac{p(x|C_1)p(C_1)}{p(x,C_1) + p(x, C_2)}\\ &=& \frac{p(x|C_1)p(C_1)}{p(x|C_1)p(C_1) + p(x | C_2)p(C_2)}\end{eqnarray}$$
Following that, let $a$ be
$$a = \log\frac{p(x|C_1)p(C_1)}{p(x|C_2)p(C_2)}$$
We call $\frac{p(x|C_1)p(C_1)}{p(x|C_2)p(C_2)} = \frac{p(C_1|x)p(x)}{p(C_2|x)p(x)} = \frac{p(C_1|x)}{p(C_2|x)}$ as "odds". Logarithm of "Odds" is called as "log odds".　Now, following is mathmaticaly trivial. $$e^{-a} = \frac{p(C_2|x)p(x)}{p(C_1|x)p(x)}$$
Therefore, $p(C_1|x)$ can be expressed as bellow $$p(C_1|x) = \frac{1}{1+ e^{-a}} = \sigma(a)$$
We can call the function $\sigma(a)$ as "Logistic Sigmoid Function". The "Inverse function" of "Logistic Sigmoid Function" is called "logit function".
$$a = \log \frac{\sigma(a)}{1 - \sigma(a)} = \log\frac{p(C_1|x)}{p(C_2|x)}$$ Let $w$ be $(w_0, w_1) $, $x$ be $(1, x)$, "Logistic regression" depict probability of $p(1|x)$ as
$$p(1|x) = \frac{1}{1 + \exp(w_0 + w_1x)}$$
Now let us assume $a$ is $w_0 + w_1x$, needless to say, this is "logistic sigmoid function".　

Note : This function is "non-linear function". In essense, <span style = "text-decoration:underline">we transform linear function $a$ to non-linear funciton with "logistic sigmoid function" !</span>
So, Let us think about "odds" with $w$ and $x$.
$$a = w^Tx = \log\frac{p(C_1|x)}{p(C_2|x)}$$
Hence,"odds" is expressed as, $$\frac{p(C_1|x)}{p(C_2|x)}=e^{w^Tx}$$

2. How the change with respect to $x$ affect to odds??¶

Now we think about how the change in $x$ affect to odds. Let us think about $\tilde{x} = (1, x+1)^T$, "odds ratio" would be,
$$\begin{eqnarray}\frac{\frac{p(C_1|\tilde{x})}{p(C_2|\tilde{x})}}{\frac{p(C_1|x)}{p(C_1|x)}} &=& \frac{exp(w^T\tilde{x})}{exp(w^T{x})} \\ &=& exp(w_1) \end{eqnarray}$$
Consequently, addition of 1 in x will change odds rate of $exp(w_1)$.

Tech Notes

Sunday, May 20, 2018