## Derivatives and matrix-vector products (part I)

We don’t usually think of linear algebra being compatible with derivatives, but it very useful to be able to know the derivative of certain elements in order to adjust them—basically using gradient-descent.

We might therefore ask ourselves questions like, how do I compute the derivative of a scalar product relative to one of the vector elements? Of a matrix-vector product? Through a matrix inverse? Through a product of matrices? Let’s answer the first two questions for now.

First, we’ll establish the derivative of a scalar (or dot) product, relative to one of its element.

Let $a$ and $x$ be two column vectors (some authors prefer to use “any convenient” vector orientation, but distinguishing between row and column vector often clarifies things). Their dot product is therefore $a^{T}x$.

The derivative of $a^{T}x$ relative to $a_i$, one of the components of the vector $a$ is

$\displaystyle \frac{d}{da_{i}} a^{T}x$

$\displaystyle = \frac{d}{da_{i}} ( a_1x_1 + \cdots + a_ix_i + \cdots + a_nx_n)$

$\displaystyle = \frac{d}{da_{i}} a_ix_i = x_i$.

The derivative of $a^{T}x$ relative to $x_j$ is

$\displaystyle \frac{d}{dx_{j}} a^{T}x$

$\displaystyle = \frac{d}{dx_{j}} ( a_1x_1 + \cdots + a_jx_j + \cdots + a_nx_n)$

$\displaystyle = \frac{d}{dx_{j}} a_jx_j = a_j$.

we see that in those derivatives, $a$ and $x$ behave essentially the same.

*
* *

Matrix-vector products do not share that symmetry. Let $A$ be a $k \times n$ matrix and $x$ be a column vector of length $n$.

The derivative of $Ax$ relative to $a_{ij}$, an entry of matrix $A$ (being composed of the row vectors $a_i$), is

$\displaystyle \frac{d}{da_{ik}} Ax$

$\displaystyle =\frac{d}{da_{ij}} \left[~ \begin{matrix} \cdots & a_1 & \cdots\\ & \vdots & \\ \cdots & a_i & \cdots\\ & \vdots & \\ \cdots & a_k & \cdots\\ \end{matrix} ~\right] \left[~ \begin{matrix} \vdots\\ \vdots\\ x \\ \vdots\\ \vdots\\ \end{matrix} ~\right]$

$\displaystyle =\frac{d}{da_{ij}} \left[~ \begin{matrix} a_1 x\\ a_2 x\\ \vdots\\ \vdots\\ a_k x\\ \end{matrix} ~\right]= \left[~ \begin{matrix} 0 \\ \vdots\\ \frac{d}{da_{ij}}a_ix\\ \vdots\\ 0\\ \end{matrix} ~\right]= \left[~ \begin{matrix} 0 \\ \vdots\\ x_j\\ \vdots\\ 0\\ \end{matrix} ~\right]$

with $x_j$ in the $i$th row.

When we’re interested in the derivative relative to one of the $x_j$, we have:

$\displaystyle \frac{d}{dx_j} Ax$

$\displaystyle =\frac{d}{dx_j} \left[~ \begin{matrix} a_1 x\\ a_2 x\\ \vdots\\ \vdots\\ a_k x\\ \end{matrix} ~\right]= \left[~ \begin{matrix} \frac{d}{dx_j}a_1x\\ \frac{d}{dx_j}a_2x\\ \vdots\\ \vdots\\ \frac{d}{dx_j}a_kx\\ \end{matrix} ~\right]= \left[~ \begin{matrix} a_{1j}\\ a_{2j}\\ \vdots\\ \vdots\\ a_{kj}\\ \end{matrix} ~\right]$

Both results may be easily established by using the derivative of the dot product.

*
* *

Those building blocks are useful to use gradient-descent through linear components of a system. While linear transformations are usually thought of as being somewhat uninteresting, one must keep in mind that some very interesting transformations—like the discrete Fourier transform—are linear.

### 2 Responses to Derivatives and matrix-vector products (part I)

1. cuant says:

Hey, I think there’s a mistake in the first part. The derivative with respect to x_j is equal a_j and not x_j.