While doing something else I stumbled upon the interesting problem of computing the average depth of nodes in a tree. The depth of a node is the distance that separates that node from the root. You can either decide that the root is at depth 1, or you can decide that it is at depth zero, but let’s decide on depth 1. So an immediate child of the root is at depth two, and its children at depth 3, and so on until you reach leaves, nodes with no children.

So the calculation of the average node depth (including leaves) in a tree comes interesting when we want to know how far a constructed tree is from the ideal full tree, as a measure of (application-specific) performance. After searching a bit on the web, I found only incomplete or incorrect formulas, or stated with proof. This week, let us see how we can derive the result without (too much) pain.

Let us start with a tree with nodes (in what follows, the distinction between internal nodes and leaves noted where irrelevant, so we’ll just say ‘nodes’ when it does not matter). First, we find integers and such that .

Then we reflect on what imposes on the shape of the tree. If (or, from the decomposition above, with ) the tree can be balanced in such a way that it has all leaves on the same level. The tree is perfectly triangular:

In that case, the average depth is given by

because there are nodes at depth (indeed: at depth , the root, there is node. At depth , we have nodes, etc.) At depth , all nodes have depth , contributing to the total. Since there are nodes, the average is given by the formula above.

But should look familiar to you. But we have , so we need to transform it a bit so that we fall back again on the form . First:

and we know that

Finally,

and is approximately .

*

* *

However, if , then we have a tree with leaves on one depth more than the main tree, and the tree looks something like:

The depth equation becomes now

which simplifies to

*

* *

The hard—or rather inconvenient—part is to find the decomposition which needs either a call to log-base-2 (gasp!) or steps using an integer method. Other than that, now that you know how the formula is derived (with the root starting at depth *one*, not zero, because you still need 1 operation to look at it), there shouldn’t be any more problems.

Also,the formula applies to *full trees*, not any kind of trees. If you have a Fibonacci tree, or some other kind of tree, then you will have to find another formula for the average depth, and it may not be trivial.

Without ever stating so, it is clear the ‘n’ is the depth of the tree. But the meaning of ‘k’ is also never explicitly stated and I’m finding it’s relation to any part of the tree difficult to understand.

Ah. Well, : any number can be written uniquely that way provided that . For example, , .