Loading [MathJax]/jax/output/HTML-CSS/jax.js

Monday, October 1, 2018

Gradient of Softmax Function

When it comes to multiclass classification in the context of neural network, Softmax function is utilized as activation function. In this article, I will share with you about derivation of gradient of Softmax function. If you have curiousity towards differential of Affine trasformation, you can check Gradient of Affine transformation.

0. Product and Quotient rule of differential

For the sake of derivation of gradient of Softmax function, Understanding of Product and Quotient rule of differential is somehow imperative. Therefore ahead of derivation of gradient, you might wanna wrapp you head around it here.

0-a. Product rule

If two functions f(x) and g(x) are differentiable then the product of them are also differentiable. Differential takes form,

{f(x)g(x)}=f(x)g(x)+f(x)g(x)

0-b. Quotient rule

Quotient rule can be derived from Product rule :) It worth deriving from product rule here! Applying product rule to f1(x)g(x),

f1(x)g(x)=(f1(x))g(x)+f1(x)g(x)=(f(x))2f(x)g(x)+f1g(x)=f(x)g(x)f(x)g(x){f(x)}2

(*) it's trivial using differencetial of sythtic function as following. Let y be y=t1, and t be t=f(x),

y=t2f(x)=(f(x))2f(x)

1. Gradient of Softmax function

Let the input of Softmax function be ak=(a1,a2,,an), output be yk=(y1,y2,,yk), Softmax function can be expressed as below,

yk=exp(ak)ni=1exp(ai)

For instance, regarding partial differential of a1, a1 is related to all output yk=(y1,y2,,yk). Hence, we'd better think in cases separately.

  • In case of l=k
ylak=ni=1exp(ai)exp(ak)exp(ak)exp(ak){ni=1exp(ak)}2=exp(ak)(ni=1exp(ai)exp(ak)){ni=1exp(ai)}2=exp(ak)ikexp(ai){ni=1exp(ak)}2=yk(1yk)
  • In case of lk
ylak=ni=1exp(ai)0exp(al)exp(ak){ni=1exp(ak)}2=exp(al)exp(ak){ni=1exp(ai)}2=ylyk

From above result, we can say the differential of Softmax function is,

ylak={yk(1yk)(k=l)ykyl(kl)

No comments:

Post a Comment