The height of a binary search tree: the limiting distribution perspective

  • Authors:
  • Charles Knessl;Wojciech Szpankowski

  • Affiliations:
  • Department of Mathematics, Statistics & Computer Science, University of Illinois at Chicago, Chicago, IL;Department of Computer Science, Purdue University, 1398 Computer Science Building, West Lafayette, IN

  • Venue:
  • Theoretical Computer Science
  • Year:
  • 2002

Quantified Score

Hi-index 5.23

Visualization

Abstract

We study the height of the binary search tree--the most fundamental data structure used for searching. We assume that the binary search tree is built from a random permutation of n elements. Under this assumption, we study the limiting distribution of the height as n → ∞. We show that the distribution has six asymptotic regions (scales). These correspond to different ranges of k and n, where Pr{Hn ≤ k} is the height distribution. In the critical region (the so-called central region), where most of the probability mass is concentrated, the limiting distribution satisfies a non-linear integral equation. While we cannot solve this equation exactly, we show that both tails of the distribution are roughly of a double exponential form. From our analysis, we conclude that the average height E[Hn] ∼ A log n - 3/2[A/(A-1)]log log n, where A = 4.311 ... is the unique solution of x logx - x - xlog2 + 1 = 0, x 1, while the variance Var[Hn] = O(1). The second term in the expansion of E[{kn] and the rate of growth of the variance were also recently obtained by B. Reed who used probabilistic arguments, while M. Drmota established the growth of the variance by analytic methods. Our analysis makes certain assumptions about the forms of some asymptotic expansions, as well as their asymptotic matching.