Derivatives and matrix-vector products (part I)


We don’t usually think of linear algebra being compatible with derivatives, but it very useful to be able to know the derivative of certain elements in order to adjust them—basically using gradient-descent.

We might therefore ask ourselves questions like, how do I compute the derivative of a scalar product relative to one of the vector elements? Of a matrix-vector product? Through a matrix inverse? Through a product of matrices? Let’s answer the first two questions for now.

Read the rest of this entry »

Taylor Series


A Taylor series for a function f(x) around x_0 that is n times differentiable is given by

\displaystyle f(x) \approx f(x_0)+f'(x_0)(x-x_0)+\frac{f''(x_0)}{2}(x-x_0)^2+\frac{f'''(x_0)}{6}(x-x_0)^3+\cdots


\displaystyle f(x) \approx \sum_{i=0}^{n} \frac{f^{(i)}(x_0)}{i!}(x-x_0)^i,

where f^{(i)}(x) is the ith derivative of f at x.

Have you ever wondered where the coefficients in a Taylor series come from? Well, let’s see!

Read the rest of this entry »

Building a large text corpus (Part I)


Getting good text data for language-model training isn’t as easy as it sounds. First, you have to find a large corpus. Second, you must clean it up!

Read the rest of this entry »

Reinventing the Wheel (or not)


About a week ago, some dude drops on IRC that he’s beat memcpy “by a lot”. That’d be interesting, except that we couldn’t get neither code nor test methodology out of him. But, how hard can making a better memcpy be? Turns out, harder than you think!

If you think this is a typical case of “reinventing the wheel”, I mostly agree with you. But while reinventing will be hard, can improvements be made?

Read the rest of this entry »