(Un)expected behavior of digital search tree profile

  • Authors:
  • Michael Drmota;Wojciech Szpankowski

  • Affiliations:
  • Inst. Discrete Mathematics and Geometry, Wien, Austria;Purdue University, West Lafayette, IN

  • Venue:
  • SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

A digital search tree (DST) -- one of the most fundamental data structures on words -- is a digital tree in which keys (strings, words) are stored directly in (internal) nodes. Such trees find myriad of applications from the popular Lempel-Ziv'78 data compression scheme to distributed hash tables. The profile of a DST measures the number of nodes at the same distance from the root; it is a function of the number of stored strings and the distance from the root. Most parameters of DST (e.g., height, fill-up) can be expressed in terms of the profile. However, from the inception of DST, the analysis of the profile has been elusive and it has become a prominent open problem in the area of analysis of algorithms. We make here the first, but decisive, step towards solving this problem. We present a precise analysis of the average profile when stored strings are generated by a biased memoryless source. The main technical difficulty of analyzing the profile lies in solving a sophisticated recurrence equation. We present such a solution for the Poissonized version of the problem (i.e., when the number of stored strings is generated by a Poisson distribution) in the Mellin transform domain. To accomplish it, we introduce a novel functional operator that allows us to express the solution in an explicit form, and then using analytic algorithmics tools to extract the asymptotic behavior of the profile. This analysis is surprisingly demanding but once it is carried out it reveals unusually intriguing and interesting behavior. The average profile undergoes several phase transitions when moving from the root to the longest path. At first, it resembles a full tree until it abruptly starts growing polynomially and it oscillates in this range. Our results are derived by methods of analytic algorithmics such as generating functions, Mellin transform, Poissonization and de-Poissonization, the saddle-point method, singularity analysis and uniform asymptotic analysis.