Distances from the root in skew trees
by Edsger W. Dijkstra and C.S. Scholten Renewed interest in algorithms for sorting in situ raised the question of the average distance from node to root in binary trees other than balanced ones. (Here binary trees are to be understood as rooted trees in which nodes have zero, one, or two sons.) In particular we consider the infinite sequence of trees i ≥ 0), in which for some fixed p and q (p > q ≥ 0)**T**for_{i}**0 ≤ i < p**are arbitrarily chosen**T**and_{n}**T**are the subtrees of_{n+q}**T**_{n+p}
With T, we have_{i}
with T from its root we have_{i}
We are interested in the asymptotic behaviour of i, this ratio being the average distance from the root in T._{i}Without loss of generality we can confine ourselves to the case gcd(p,q) = g > 1, the sequence T consists of an interleaving of _{i}g mutually independent sequences.With A and _{i} = H_{i} + 1B defined by _{i}B, we derive_{i} = G_{i} – 2Equation (1) is not homogeneous in the B's, but by solving it for the A's and substituting them in (0), we get a homogeneous recurrence relation for the B's, the characteristic polynomial of which is the product of (0)'s characteristic polynomial and the characteristic polynomial corresponding to the homogeneous part of (1). We conclude that the characteristic equation for the A is
_{i}and that that for the B is
_{i}Under the constraints gcd(p,q) = 1 and p > q ≥ 0, (2) enjoys the property of having one positive root, r say, such that r > 1 and all other roots of (2) have a modulus smaller than r. (For a proof of this theorem, see later.)
From this and the theory of linear recurrence relations we conclude
Substituting these leading terms in (1) we get
Since We define the skewness of a binary tree as the ratio (≥ 1) of the numbers of nodes in its two subtrees. For the trees T it follows from the leading term of _{i}A that the asymptotic skewness _{i}s is given by s = r, whence ^{q}q = . Remembering that ^{r}log sr satisfies (2), we find r, whence ^{p} = s+1p = . Hence (4) can be rewritten as
^{r}log (s+1)
Because B—, we conclude that, expressed in _{i} / A_{i}r and s, the average distance from the root in T is for large _{i}i
Consequently, the average distance from the root in a tree from the sequence N nodes is
i.e. an expression in
We are left with the obligation to prove that r > 1 dominating the others. Since f 0 = –1 and f(+∞) = +∞, f x = 0 has an odd number of positive roots. Because f' x = x we conclude that ^{q–1 }· ( p·x^{p–q} – q )f' x = 0 has at most 1 positive root; hence f x = 0 has 1 positive root r and, because f 1 = –1, we conclude r > 1. In other wordsIn order to prove dominance of r, we consider a root m·e of (2), with ^{i·φ}m > 0. Consequently
from which we derive —by taking absolute values—
Since m<r ∨ φ = 0. q.e.d.
Finally we remark that thanks to
any value of 28 August 1981
transcribed by Martijn van der Veen revised |
|||||||||||||||