Wavelet Trees: From Theory to Practice

  • Authors:
  • Roberto Grossi;Jeffrey Scott Vitter;Bojian Xu

  • Affiliations:
  • -;-;-

  • Venue:
  • CCP '11 Proceedings of the 2011 First International Conference on Data Compression, Communications and Processing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The \emph{wavelet tree} data structure is a space-efficient technique for rank and select queries that generalizes from binary characters to an arbitrary multicharacter alphabet. It has become a key tool in modern full-text indexing and data compression because of its capabilities in compressing, indexing, and searching. We present a comparative study of its practical performance regarding a wide range of options on the dimensions of different coding schemes and tree shapes. Our results are both theoretical and experimental: (1)~We show that the run-length $\delta$ coding size of wavelet trees achieves the 0-order empirical entropy size of the original string with leading constant 1, when the string's 0-order empirical entropy is asymptotically less than the logarithm of the alphabet size. This result complements the previous works that are dedicated to analyzing run-length $\gamma$-encoded wavelet trees. It also reveals the scenarios when run-length $\delta$ encoding becomes practical. (2)~We introduce a full generic package of wavelet trees for a wide range of options on the dimensions of coding schemes and tree shapes. Our experimental study reveals the practical performance of the various modifications.