Multiple choice tries and distributed hash tables

  • Authors:
  • Luc Devroye;Gabor Lugosi;Gahyun Park;Wojciech Szpankowski

  • Affiliations:
  • McGill University, Montreal, Canada;Universitat Pompeu Fabra, Ramon Trias Fargas, Barcelona, Spain;University of Wisconsin, Whitewater, WI;Purdue University, West Lafayette, IN

  • Venue:
  • SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Tries were introduced in 1960 by Fredkin as an efficient method for searching and sorting digital data. Recent years have seen a resurgence of interest in tries. In some of these applications, most notably in distributed hash tables one needs to design a well balanced trie. In this paper we consider tries built from n strings such that each string can be chosen from a pool of k strings, each of them generated by a discrete i.i.d. source. Three cases are considered: k = 2, k is large but fixed, and k ~ clog n. Various parameters such as height and fill-up level are analyzed. It is shown that for two-choice tries a 50% reduction in height is achieved when compared to ordinary tries. In a greedy online construction when the string that minimizes the depth of insertion for every pair is actually inserted, the height is only reduced by 25%. In order to further reduce the height by another 25%, we design a more refined on-line algorithm. The total computation time of the algorithm is O(nlog n). Furthermore, when we choose the best among k ≥ 2 strings, then for large but fixed k the height is asymptotically equal to the typical depth in a trie, a result that cannot be improved. Further improvement can be achieved if the number of choices is proportional to log n. In this case for unbiased memoryless sources highly balanced trees can be constructed by a simple greedy algorithm for which the difference between the height and the fill-up level is bounded by a constant with high probability. This, in turn, has implications for distributed hash tables, leading to a randomized ID management algorithm in peer-to-peer networks such that, with high probability, the ratio between the maximum and the minimum load of a processor is O(1).