I am reading a chapter on trees in book on Data structures and Algorithms by Mark Allen Weiss. Here is text snippet.
Let D(n) be the internal path length for some tree T of n nodes. D(1)
= 0. An n-node tree consists of an i-node left subtree and an (n – i
– 1)-node right subtree, plus a root at depth zero for 0<= i < n. D(i)
is the internal path length of the left subtree with respect to its
root. In the main tree, all these nodes are one level deeper. The same
holds for the right subtree. Thus, we get the recurrenceD(n) = D(i) + D(n – i -1) + n -1
If all subtree sizes are equally likely, which is true for binary
search trees (since the subtree size depends only on the relative rank
of the first element inserted into the tree), but not binary trees,
then the average value of both D(i) and D(n – i -1) is (1/n) sum from
j =0 to n-1 of D(j). This yieldsD(n) = (2/n)(sum from j = 0 to n-1 of D(j)) + (n-1).
The above recurrence obtains an average values of D(n) = O(nlogn).
Following are my questions on above text snippet.
- What does author mean “since subtree size depends only on the relative rank of the first element inserted into the tree” ?
- How author achieved average value O(nlogn) from D(n)? Can any one please show me steps involved in achieving the mentioned result?
Thanks!
About your first point :
In a binary tree the size of the left subtree correspond to the number of element smaller than the root and the size of the right subtree to the element larger than the root.
Therefore the subtree size deponds only on the relative rank of the first element inserted.
About your second point I don’t have the solution but I would start this way:
You can first transfor the sum :
you know that
sum(j=0 to n, of j ) = n*(n-1)/2then
n-1 = 2/n*sum(j=0 to n-1, of 1 ) +2/n*n = 2/n*sum(j=0 to n-1, of j ) + 2Since
D(n) = (2/n)(sum from j = 0 to n-1 of D(j)) + (n-1), you get the new formulanow you can express Dn in term of Dn-1
you would find (if i’m right):
Then try to express Dn as n*Sum(1/k) which is equivalent to nln(n)…
from the above formula (2) you get (you can try to write it):
tell me if you have more questions on this proof
Hope it helps
EDIT: details on (2)