We don’t usually think of linear algebra being compatible with derivatives, but it very useful to be able to know the derivative of certain elements in order to adjust them—basically using gradient-descent.

We might therefore ask ourselves questions like, how do I compute the derivative of a scalar product relative to one of the vector elements? Of a matrix-vector product? Through a matrix inverse? Through a product of matrices? Let’s answer the first two questions for now.

First, we’ll establish the derivative of a scalar (or dot) product, relative to one of its element.

Let and be two column vectors (some authors prefer to use “any convenient” vector orientation, but distinguishing between row and column vector often clarifies things). Their dot product is therefore .

The derivative of relative to , one of the components of the vector is

.

The derivative of relative to is

.

we see that in those derivatives, and behave essentially the same.

* * *

Matrix-vector products do not share that symmetry. Let be a matrix and be a column vector of length .

The derivative of relative to , an entry of matrix (being composed of the row vectors ), is

with in the th row.

When we’re interested in the derivative relative to one of the , we have:

Both results may be easily established by using the derivative of the dot product.

* * *

Those building blocks are useful to use gradient-descent through linear components of a system. While linear transformations are usually thought of as being somewhat uninteresting, one must keep in mind that some very interesting transformations—like the discrete Fourier transform—are linear.

Hey, I think there’s a mistake in the first part. The derivative with respect to x_j is equal a_j and not x_j.

You’re right. It’s fixed now. Thanks!