Tech Notes: Basics of Linear Regression

I will share with you about basics of Linear Regression. It's somehow entry point to statistics. However my ulterior motive is understanding of regularization. Actually, it's gonna be too long and intricate to cope with regularization here. Thus, in this article, I will handle only basics of linear regression. In terms of regularization, I will upload very soon :)

0. Simple linear regression¶

First of all, we have to wrap our head around "Simple linear regression".
Suppose x be predictor variable, y be dependent variable. Then linear regression line takes form

$$\hat{y} = \beta_0 + \beta_1x$$

Let experimental unit be $(x_i, y_i)\ (i = 1,2,\cdots,n)$, "Residential error" $S$ can be denoted as following.

$$y - \hat{y_i}$$

One of the way to obtain "best fitting" is to invoke "least squares criterion" which says "minimize the sum of the squared residual errors".
$$S = \sum_{i=1}^{n}\left\{y_i -(\beta_0 + \beta_1x)\right\}^2$$

As you can see, $S$ is gonna be quadratic function of the $\beta_0$ and $\beta_1$.
Thereby we can obtain "best fitting" by computing the followings.

$$\frac{\partial S}{\partial \beta_0} = 0$$$$\frac{\partial S}{\partial \beta_1} = 0$$

1. Multiple linear regression¶

Now let's getting into "Multiple linear regression". Let predictor variable be $(x_1, x_2, \cdots,x_d)$, dependent variable $y$ . Multiple linear regression line takes form

$$y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots \beta_d x_d$$

where $( \beta_0,\beta_1, \cdots \beta_d )$ are called "regression coefficient".
Let's say predictor variable is $\vec{x}_t = (x_{t1}, x_{t2}, \cdots, x_{td})^T\ (t=1,2, \cdots, n)$, response variable is $\vec{y}_t\ (t=1,2, \cdots, n)$,

$$\hat{y_t} = \beta_0 + \beta_1x_{t1} + \beta_2x_{t2} + \cdots + \beta_dx_{td}$$

Now we'd like to denote this for $n$ experimental unit. Let $X$ be

$$X = \left(\begin{array}{cccc}1 & x_{ 11 } & \ldots & a_{ 1d } \\ 1 & x_{ 21 } & \ldots & a_{ 2d }\\ \vdots & \vdots & \ldots & \vdots\\ 1 & x_{ n1 } & \ldots & a_{ nd } \end{array}\right)$$

$\vec{\hat{y}}$ be

$$\vec{\hat{y}} = (y_1,y_2, \cdots, y_n)$$

$\vec{\beta}$ be

$$\vec{\beta} = (\beta_1, \beta_2, \cdots, \beta_d)$$

We can denote, $$\vec{\hat{y}} = X\vec{\beta}$$

Suppose $\epsilon$ is $\epsilon = (\epsilon_1, \epsilon_2, \cdots, \epsilon_n)^T$ as a "residual error". $$\vec{y} = X\vec{\beta} + \epsilon$$
Thereby,
$$\vec{y} - \vec{\hat{y}} = \epsilon$$ and
$$\epsilon_t = \vec{y_t} -\vec{\hat{y_t}}$$
Same as "simple linear regression", we will apply "least square criterion" for residual error, $$\begin{eqnarray}S &=& \sum ^n_{t=1}\epsilon_t^2 \\ &=&\sum ^n_{t=1}\left(\vec{y_t} -\vec{\hat{y_t}}\right)^2 \\ &=&\sum ^n_{t=1}\left(\vec{y} -X\beta \right)^2\\ &=& \left(\vec{y} -X\beta \right)^T\left(\vec{y} -X\beta \right) \end{eqnarray}$$

For the sake of best fitting, what we have to do is compute $$\frac{\partial S}{\partial \beta} = \vec{0}$$ $$\begin{eqnarray}S &=& \left(\vec{y} -X\beta \right)^T\left(\vec{y} -X\beta \right)\\ &=&\left(X\beta -\vec{y} \right)^T\left(X\beta - \vec{y} \right)\\\end{eqnarray}$$

$$\begin{eqnarray}\frac{\partial S}{\partial \beta}&=&2X^TX\beta -2X^Ty\\ &=& -2X^T\left(\vec{y}-X\vec{\beta}\right) = \vec{0}\end{eqnarray}$$$$\therefore\ \vec{\beta} = \left(X^TX\right)^{-1}X^T\vec{y}$$

However sometimes we can't find $\beta$ due to $\left(X^TX\right)^{-1}$ doesn't exist which is equivalent to $\left(X^TX\right)^{-1}$ is not regular matrix. In that situation, we can apply "regularization". I will write an article about "regularization" very soon.

2. Coefficient of determination¶

After creating "linear regression model", you might wanna assess how fit your model is. "Coefficient of determination" is the quotient of the variances of the fitted value and observed values of dependent variable. Let $S_y$ and $S_{\epsilon}$ be,
$$S_y = \frac{1}{n}\sum^{n}_{t=1}\left(y_t - \bar{y}\right)^2$$ where $\bar{y}$ is mean of y. $$S_\epsilon = \frac{1}{n}\sum^{n}_{t=1}\left(y_t - \hat{y_t}\right)^2$$ and $S_r$ is
$$S_r = S_y - S_\epsilon$$
Then "coefficient determination " $R^2$ can be denoted as folows,

$$R^2 = \frac{S_r}{S_y} = 1 - \frac{S_\epsilon}{S_y}$$

It's tribial that $R$ is $0\leqq R^2 \leqq 1$, and as bigger the coefficient determinant is, linear regression line fits well.

Tech Notes

Monday, July 30, 2018

Basics of Linear Regression

0. Simple linear regression¶

1. Multiple linear regression¶

2. Coefficient of determination¶

1 comment: