Distances from the root in skew trees
by Edsger W. Dijkstra and C.S. Scholten
Renewed interest in algorithms for sorting in situ raised the question of the average distance from node to root in binary trees other than balanced ones. (Here binary trees are to be understood as rooted trees in which nodes have zero, one, or two sons.)
In particular we consider the infinite sequence of trees Ti (i ≥ 0), in which for some fixed p and q (p > q ≥ 0)
With Hi = the number of nodes in Ti, we have
with Gi = the sum of the distances of te nodes of Ti from its root we have
We are interested in the asymptotic behaviour of Gi / Hi for large i, this ratio being the average distance from the root in Ti.
Without loss of generality we can confine ourselves to the case
With Ai defined by Ai = Hi + 1 and Bi defined by Bi = Gi 2, we derive
Equation (1) is not homogeneous in the B's, but by solving it for the A's and substituting them in (0), we get a homogeneous recurrence relation for the B's, the characteristic polynomial of which is the product of (0)'s characteristic polynomial and the characteristic polynomial corresponding to the homogeneous part of (1). We conclude that the characteristic equation for the Ai is
and that that for the Bi is
Under the constraints gcd(p,q) = 1 and p > q ≥ 0, (2) enjoys the property of having one positive root, r say, such that r > 1 and all other roots of (2) have a modulus smaller than r. (For a proof of this theorem, see later.)
From this and the theory of linear recurrence relations we conclude
Substituting these leading terms in (1) we get
Since r is a root of (2) this can be reduced to
We define the skewness of a binary tree as the ratio (≥ 1) of the numbers of nodes in its two subtrees. For the trees Ti it follows from the leading term of Ai that the asymptotic skewness s is given by s = rq, whence q = rlog s. Remembering that r satisfies (2), we find rp = s+1, whence p = rlog (s+1). Hence (4) can be rewritten as
Because r > 1 so that the asymptotic value of Gi / Hi equals that of Bi / Ai, we conclude that, expressed in r and s, the average distance from the root in Ti is for large i
Consequently, the average distance from the root in a tree from the sequence Ti with N nodes is
i.e. an expression in s and N only.
* * *
We are left with the obligation to prove that f x = 0 with f x = xp xq 1 has one positive root r > 1 dominating the others. Since f 0 = 1 and f(+∞) = +∞, f x = 0 has an odd number of positive roots. Because f' x = xq1 · ( p·xpq q ) we conclude that f' x = 0 has at most 1 positive root; hence f x = 0 has 1 positive root r and, because f 1 = 1, we conclude r > 1. In other words
In order to prove dominance of r, we consider a root m·ei·φ of (2), with m > 0. Consequently
from which we derive —by taking absolute values—
Since gcd(p,q) = 1, this can be rewritten as mp mq 1 < 0 ∨ φ = 0 or, in view of (6): m<r ∨ φ = 0. q.e.d.
* * *
Finally we remark that thanks to
any value of s can be approximated arbitrarily closely by suitable choice of p and q, a fact that enhances the significance of the expression in s and N for the average distance from the root.
28 August 1981
transcribed by Martijn van der Veen