Average node depth in a Full Tree

While doing something else I stumbled upon the interesting problem of computing the average depth of nodes in a tree. The depth of a node is the distance that separates that node from the root. You can either decide that the root is at depth 1, or you can decide that it is at depth zero, but let’s decide on depth 1. So an immediate child of the root is at depth two, and its children at depth 3, and so on until you reach leaves, nodes with no children.

tree-diagram7

So the calculation of the average node depth (including leaves) in a tree comes interesting when we want to know how far a constructed tree is from the ideal full tree, as a measure of (application-specific) performance. After searching a bit on the web, I found only incomplete or incorrect formulas, or stated with proof. This week, let us see how we can derive the result without (too much) pain.

Let us start with a tree with m nodes (in what follows, the distinction between internal nodes and leaves noted where irrelevant, so we’ll just say ‘nodes’ when it does not matter). First, we find integers k\leqslant{0} and n\leqslant{0} such that m \sim 2^n+k.

Then we reflect on what m imposes on the shape of the tree. If m\sim2^n-1 (or, from the decomposition above, m\sim 2^{n-1}+(2^{n-1}-1) with k=2^{n-1}-1) the tree can be balanced in such a way that it has all leaves on the same level. The tree is perfectly triangular:

triangular-tree

In that case, the average depth is given by

\displaystyle \bar{d}=\frac{1}{2^n-1}\sum_{d=1}^n d 2^{d-1}

because there are 2^{d-1} nodes at depth d (indeed: at depth d=1, the root, there is 2^{d-1}=2^0=1 node. At depth d=2, we have 2^{d-1}=2^1=2 nodes, etc.) At depth d, all 2^{d-1} nodes have depth d, contributing d 2^{d-1} to the total. Since there are m=2^n-1 nodes, the average is given by the formula above.

But \sum d 2^{d} should look familiar to you. But we have \sum d 2^{d-1}, so we need to transform it a bit so that we fall back again on the form \sum d 2^d. First:

\displaystyle \sum_{d=1}^2 d 2^{d-1}=\sum_{d=1}^n d \left(\frac{1}{2}2^d\right)=\frac{1}{2}\sum_{d=1}^n d 2^d

and we know that

\displaystyle \sum_{d=1}^n d 2^d = 2^{n+1}(n-1)+2

Finally,

\displaystyle \frac{1}{2^n-1}\sum_{d=1}^n d 2^{d-1}=\left(\frac{1}{2^n-1}\right)\left(\frac{1}{2}\sum_{d=1}^n d 2^d\right)

\displaystyle =\left(\frac{1}{2^n-1}\right)\left(\frac{1}{2}\left(2^{n+1}(n-1)+2\right)\right)=\frac{2^n(n-1)+1}{2^n-1}

and \displaystyle \frac{2^n(n-1)+1}{2^n-1} is approximately n.

*
* *

However, if m\not\sim 2^n-1, then we have a tree with k+1 leaves on one depth more than the main tree, and the tree looks something like:

regular-tree

The depth equation becomes now

\frac{1}{2^n+k}\left(\sum_{d=1}^n d 2^{d-1}+(k+1)(n+1)\right)

which simplifies to

\displaystyle \bar{d}=\frac{2^n(n-1)+1+(k+1)(n+1)}{2^n+k}

*
* *

The hard—or rather inconvenient—part is to find the decomposition m=2^n+k which needs either a call to log-base-2 (gasp!) or O(\lg m) steps using an integer method. Other than that, now that you know how the formula is derived (with the root starting at depth one, not zero, because you still need 1 operation to look at it), there shouldn’t be any more problems.

Also,the formula applies to full trees, not any kind of trees. If you have a Fibonacci tree, or some other kind of tree, then you will have to find another formula for the average depth, and it may not be trivial.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: