Tuesday, October 2, 2018

Differential of Softmax and Cross entropy layer

In neural network, Softmax function and Cross entropy is utilized as a set. Hence it's convenient to think differential for Softmax - cross entorpy layer. In this article, I will share with you how to derive the differential of "Softmax - cross entorpy layer". As a prior knowledge, reading links is prefered :)

0. Differential of Cross Entropy

Let $y_k$ be output of softmax function , $t_k$ be correct data as one-hot vector , cross entropy takes form, $$Cross\ Entropy = -\sum^n_{k=1}t_k \log y_k$$ Obviously, we can derive differential of cross entropy with comperative ease :) Let $E$ be cross entropy,

$$\frac{\partial E }{\partial y_k} = - \frac{t_k}{y_k}$$

1. Differential of Cross Entropy

As we discussed here, differential of softmax can be depicted as following, where $a_k = (a_1, a_2, \cdots, a_n)$ is input and $y_k=(y_1, y_2, \cdots, y_k)$ is ouput for Softmax function,

$$\begin{eqnarray} \frac{\partial y_l}{\partial a_k} = \begin{cases} y_k(1- y_k) & ( k = l ) \\ -y_k y_l & ( k \neq l ) \end{cases} \end{eqnarray}$$

So differential of Softmax function can be derived as below :),

$$\begin{eqnarray} \frac{\partial E}{\partial a_k} &=& \sum^n_{i=1} \frac{\partial E}{\partial y_i}\frac{\partial y_i}{\partial a_k}\\ &=&- \frac{t_k}{y_k} y_k(1- y_k) + \sum_{i \neq k} \frac{t_i}{y_i}y_i y_k\\ &=& t_k y_k - t_k + y_k \sum_{i \neq k} t_i \\ &=& y_k - t_k \ \ \ \ \ \left(\because \sum^n_{i =1} t_i = 1\right) \end{eqnarray}\\ $$

Thus, we obtain incredibly simple result of differential, $$\frac{\partial E}{\partial a_k} = y_k - t_k$$

No comments:

Post a Comment