Partial fillup and search time in LC tries

  • Authors:
  • Svante Janson;Wojciech Szpankowski

  • Affiliations:
  • Uppsala University, Uppsala, Sweden;Purdue University, West Lafayette, IN

  • Venue:
  • ACM Transactions on Algorithms (TALG)
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Andersson and Nilsson introduced in 1993 a level-compressed trie (for short, LC trie) in which a full subtree of a node is compressed to a single node of degree being the size of the subtree. Recent experimental results indicated a “dramatic improvement” when full subtrees are replaced by “partially filled subtrees.” In this article, we provide a theoretical justification of these experimental results, showing, among others, a rather moderate improvement in search time over the original LC tries. For such an analysis, we assume that n strings are generated independently by a binary memoryless source, with p denoting the probability of emitting a “1” (and q = 1 − p). We first prove that the so-called α-fillup level Fn(α) (i.e., the largest level in a trie with α fraction of nodes present at this level) is concentrated on two values with high probability: either Fn(α) = kn or Fn(α) = kn + 1, where kn = log1/&sqrt;pq n − |ln (p/q)|/2 ln3/2 (1&sqrt;pq) Φ−1 (α) &sqrt; ln n + O(1) is an integer and Φ(x) denotes the normal distribution function. This result directly yields the typical depth (search time) Dn(α) in the α-LC tries, namely, we show that with high probability Dn(α) ∼ C2 log log n, where C2 = 1/|log(1 − h/log(1/&sqrt;pq))| for p ≠ q and h = −plog p−qlog q is the Shannon entropy rate. This should be compared with recently found typical depth in the original LC tries, which is C1log log n, where C1 = 1/|log(1−h/log(1/min{p, 1−p}))|. In conclusion, we observe that α affects only the lower term of the α-fillup level Fn(α), and the search time in α-LC tries is of the same order as in the original LC tries.