When it comes to neural network, affine transformation is continually utilized. And for stochastic gradient descent, gradient of affine transformation is required. Needless to say, you can reach them by putting words such as "gradient" and "Affine transformation" into google search form. However, in this article, I'm gonna share with you about the way of diriving gradient of Affine transformation.
0. What is Affine transformation ??¶
Affine transformation is combination of linear transformation and translation. In this article, to make it simple, we will deal with 2 dimensional vector x=(x1,x2) and 2 by 3 matrix w=(w11,w12,w13w21,w22,w23) and 3 dimentional vector b=(b1,b2,b3). In that case, Linear transformation takes form,
xw+b=(x1,x2)(w11,w12,w13w21,w22,w23)+(b1,b2,b3)=(w11x1+w21x2+b1,w12x1+w22x2+b2,w13x1,w23x2+b3)1. Differentiation of synthetic function¶
For the sake of derivation of gradient on Affine transformation, differentiation of synthetic function ought to be understood. Let z be z=f(x,y), x be x=g(t), y be y=h(t), partial differential can be computed as below,
∂z∂t=∂z∂x∂x∂t+∂z∂y∂y∂t2. Derivation of gradient of x¶
Let L be scalar of following function. Gradient of x can be derived as following.
∂L∂x=(∂L∂x1,∂L∂x2)=(∂L∂y1∂y1∂x1+∂L∂y2∂y2∂x1+∂L∂y3∂y3∂x1,∂L∂y1∂y1∂x2+∂L∂y2∂y2∂x2+∂L∂y3∂y3∂x2)=(∂L∂y1w11+∂L∂y2w12+∂L∂y3w13,∂L∂y1w21+∂L∂y2w22+∂L∂y3w23)=∂L∂y(w11,w21w12,w22w13,w23)=∂L∂ywT3. Derivation of gradient of w¶
For w, gradient can be derived by followings,
∂L∂w=(∂L∂w11,∂L∂w12,∂L∂w13∂L∂w21,∂L∂w22,∂L∂w23)=(∂L∂y1∂y1∂w11+∂L∂y2∂y2∂w11+∂Ly3∂y3w11,⋯,⋯∂L∂y1∂y1∂w21+∂L∂y2∂y2∂w21+∂Ly3∂y3w21,⋯,⋯)=(∂L∂y1x1+∂L∂y20+∂Ly30,∂L∂y10+∂L∂y2x1+∂Ly30,∂L∂y10+∂L∂y20+∂Ly3x1∂L∂y1x2+∂L∂y20+∂Ly30,∂L∂y10+∂L∂y2x2+∂Ly30,∂L∂y10+∂L∂y20+∂Ly3x2)=(∂L∂y1x1,∂L∂y2x1,∂Ly3x1∂L∂y1x2,∂L∂y2x2,∂Ly3x2)=xT∂L∂y4. Derivation of gradient of b¶
Lastly, gradient of b is derived from following,
∂L∂x=(∂L∂b1,∂L∂b2,∂L∂b3)=(∂L∂y1∂y1∂b1+∂L∂y2∂y2∂b1+∂L∂y3∂y3∂b1,∂L∂y1∂y1∂b2+∂L∂y2∂y2∂b2+∂L∂y3∂y3∂b2,∂L∂y1∂y1∂b3+∂L∂y2∂y2∂b3+∂L∂y3∂y3∂b3)=(∂L∂y11+0+0,0+∂L∂y21+0,0+0∂L∂y31)=(∂L∂y1,∂L∂y2,∂L∂y3)=∂L∂y