Processing math: 100%

Sunday, October 14, 2018

Fisher Linear Discriminant Analysis

Fisher Linear Discriminant Analysis is one of the dimentionality reduction technique. When it comes to dimentionality resduction, Principal component analysis is well-known. Nonetheless, as Principal component analysis is unsupervised technique, occasionally it would end up terrible projection for classificaiton. For instance, when I work on 2 class classification, if data are positioned in parallel and very closely together such that the biggest variance is in direction which ignore the labels, PCA would destroy the useful information and all lebels of data will be mixed... Thus I believe it's worth getting grasp for Fisher Linear Discriminant Analysis for classification.

0. Derivation of Fisher-LDA objective

Let x be data, and we have two classification label C1, C2. Let the number of data in C1 and C2 be N1, N2, respectively. Let me think the projection of x to y by,    y=wTx+w0

The mean of y would be expressed by following,

mk=1NkiCkyi

Let μ1, μ2 be mean of C1, C2, respectively, mean of y takes form,

mk=1NkiCkyi=wTμk

Since we'd like to project x to y where the mean m1 and m2 are distant as much as possible, following objective ought to be considered.

|m1m2|2=(wT(μ1μ2))((μ1μ2)Tw)=wTSBw

Note :
In this case of binary classification, we can say SB is scatter of class C1 with respect to the scatter of class 2. Generally speaking, SB is known for "between classes scatter matrix" defined as ,

SB=k(μkˉx)(μkˉx)T

Subsequently, we'd better take "within-class scatter matrix" into consideration.

Sk=iCkyimk=wT{iCk(xiμk)(xiμk)T}w

Within-class scatter matrix" Sw takes form,

Sw=ciCk(xiμk)(xiμk)T

Consequently, S1+S2 would be S1+S2=wTSww

For the sake of slick projection, Fisher_LDA cosiders maximizing the following objective,

J(w)=wTSBwwTSWw

1. Compute w from objective

Now our motive is maximize equation (3). We can obtain w which maximize equation (3) by follwing.

Jw=0

Apply quotient rule of differential, (You can check about quotient here)

j(w)w=(wTSBw)(wTSww)(wTSBw)(wTSww)(wTSww)2=0SBw(wTSWw)=(wTSBw)SWw

Since both wTSWw and wTSBw are scaler, we can denote

SBw=λSww

As long as SW is regular matrix,

S1WSBw=λw

Consequently, it shows that we can obtain w which maximize equation (3) by tackling regular eigen value problem :)

No comments:

Post a Comment