Derivatives and matrix-vector products (part I)

We don’t usually think of linear algebra being compatible with derivatives, but it very useful to be able to know the derivative of certain elements in order to adjust them—basically using gradient-descent.

We might therefore ask ourselves questions like, how do I compute the derivative of a scalar product relative to one of the vector elements? Of a matrix-vector product? Through a matrix inverse? Through a product of matrices? Let’s answer the first two questions for now.

First, we’ll establish the derivative of a scalar (or dot) product, relative to one of its element.

Let a and x be two column vectors (some authors prefer to use “any convenient” vector orientation, but distinguishing between row and column vector often clarifies things). Their dot product is therefore a^{T}x.

The derivative of a^{T}x relative to a_i, one of the components of the vector a is

\displaystyle \frac{d}{da_{i}} a^{T}x

\displaystyle = \frac{d}{da_{i}} ( a_1x_1 + \cdots + a_ix_i + \cdots + a_nx_n)

\displaystyle = \frac{d}{da_{i}} a_ix_i = x_i.

The derivative of a^{T}x relative to x_j is

\displaystyle \frac{d}{dx_{j}} a^{T}x

\displaystyle = \frac{d}{dx_{j}} ( a_1x_1 + \cdots + a_jx_j + \cdots + a_nx_n)

\displaystyle = \frac{d}{dx_{j}} a_jx_j = a_j.

we see that in those derivatives, a and x behave essentially the same.

* *

Matrix-vector products do not share that symmetry. Let A be a k \times n matrix and x be a column vector of length n.

The derivative of Ax relative to a_{ij}, an entry of matrix A (being composed of the row vectors a_i), is

\displaystyle \frac{d}{da_{ik}} Ax

\displaystyle =\frac{d}{da_{ij}}  \left[~  \begin{matrix}  \cdots & a_1 & \cdots\\   & \vdots & \\  \cdots & a_i & \cdots\\   & \vdots & \\  \cdots & a_k & \cdots\\  \end{matrix}  ~\right]  \left[~  \begin{matrix}  \vdots\\  \vdots\\  x \\  \vdots\\  \vdots\\  \end{matrix}  ~\right]

\displaystyle =\frac{d}{da_{ij}}  \left[~  \begin{matrix}  a_1 x\\  a_2 x\\  \vdots\\  \vdots\\  a_k x\\  \end{matrix}  ~\right]=  \left[~  \begin{matrix}  0 \\  \vdots\\  \frac{d}{da_{ij}}a_ix\\  \vdots\\  0\\  \end{matrix}  ~\right]=  \left[~  \begin{matrix}  0 \\  \vdots\\  x_j\\  \vdots\\  0\\  \end{matrix}  ~\right]

with x_j in the ith row.

When we’re interested in the derivative relative to one of the x_j, we have:

\displaystyle \frac{d}{dx_j} Ax

\displaystyle =\frac{d}{dx_j}  \left[~  \begin{matrix}  a_1 x\\  a_2 x\\  \vdots\\  \vdots\\  a_k x\\  \end{matrix}  ~\right]=  \left[~  \begin{matrix}  \frac{d}{dx_j}a_1x\\  \frac{d}{dx_j}a_2x\\  \vdots\\  \vdots\\  \frac{d}{dx_j}a_kx\\  \end{matrix}  ~\right]=  \left[~  \begin{matrix}  a_{1j}\\  a_{2j}\\  \vdots\\  \vdots\\  a_{kj}\\  \end{matrix}  ~\right]

Both results may be easily established by using the derivative of the dot product.

* *

Those building blocks are useful to use gradient-descent through linear components of a system. While linear transformations are usually thought of as being somewhat uninteresting, one must keep in mind that some very interesting transformations—like the discrete Fourier transform—are linear.

2 Responses to Derivatives and matrix-vector products (part I)

  1. cuant says:

    Hey, I think there’s a mistake in the first part. The derivative with respect to x_j is equal a_j and not x_j.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: