We do not know closed form solutions for all optimization problems, even when they are somewhat innoccent-looking. One of the many possible methods in such as case is to use (stochastic) gradient descent to iteratively refine the solution to the problem. This involves the computation of… yes, the gradient.
In its simplest form, the gradient descent algorithm computes the gradient of an objective function relative to the parameters, evaluated on all training examples, and uses that gradient to adjust the model’s parameters. The gradient of a (not necessarily objective) function has the general form
.
but to compute the partial derivative , you must know, or at least remember how to compute the derivative, and it’s not always trivial. But, as I was trying to remember a specific rule (involving the friendly hyperbolic tangent), I decided that I might as well make a list of useful derivative and integral rules and share it.
Of course, it is possible to have an infinite number of rules, if only by mere combination, but let us restrain ourself to the basic rules, the ones we’re most likely to encounter in (simple) optimization problems and basic geometry.
I would like this list to eventually grow, but, unlike most compendia, not to merely include more rules, but by adding actual derivations of the rules from the most basic ones. For example, the Leibniz’s integral rule clearly make sense, but what’s the simplest demonstration one can have?
*
* *
- for
constant,
-
-
-
-
-
-
-
,
-
-
-
, the “chain rule.”
-
-
,
-
,
-
-
-
, for
-
, with
constant
-
, with
constant
-
*
* *
-
-
-
-
, with variant
-
, if
. If
,
-
-
, with special cases
-
,
,
- and
,
,
-
-
-
-
, if
-
*
* *
You can download the list as a pdf.
I am pretty sure the list is accurate (in the PDF version at least), but if you spot an error, please let me know.
