Wavelet Trees: From Theory to Practice

Authors:
Roberto Grossi;Jeffrey Scott Vitter;Bojian Xu
Affiliations:
-;-;-
Venue:
CCP '11 Proceedings of the 2011 First International Conference on Data Compression, Communications and Processing
Year:
2011

Citing 0
Cited 3

Wavelet trees for all

CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Smaller self-indexes for natural language

SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Wavelet trees for all

Journal of Discrete Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

The \emph{wavelet tree} data structure is a space-efficient technique for rank and select queries that generalizes from binary characters to an arbitrary multicharacter alphabet. It has become a key tool in modern full-text indexing and data compression because of its capabilities in compressing, indexing, and searching. We present a comparative study of its practical performance regarding a wide range of options on the dimensions of different coding schemes and tree shapes. Our results are both theoretical and experimental: (1)~We show that the run-length $\delta$ coding size of wavelet trees achieves the 0-order empirical entropy size of the original string with leading constant 1, when the string's 0-order empirical entropy is asymptotically less than the logarithm of the alphabet size. This result complements the previous works that are dedicated to analyzing run-length $\gamma$-encoded wavelet trees. It also reveals the scenarios when run-length $\delta$ encoding becomes practical. (2)~We introduce a full generic package of wavelet trees for a wide range of options on the dimensions of coding schemes and tree shapes. Our experimental study reveals the practical performance of the various modifications.