The myriad virtues of wavelet trees

  • Authors:
  • Paolo Ferragina;Raffaele Giancarlo;Giovanni Manzini

  • Affiliations:
  • Dipartimento di Informatica, Università di Pisa, Italy;Dipartimento di Matematica ed Applicazioni, Università di Palermo, Italy;Dipartimento di Informatica, Università del Piemonte Orientale, Italy

  • Venue:
  • ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part I
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Wavelet Trees have been introduced in [Grossi, Gupta and Vitter, SODA '03] and have been rapidly recognized as a very flexible tool for the design of compressed full-text indexes and data compressors. Although several papers have investigated the beauty and usefulness of this data structure in the full-text indexing scenario, its impact on data compression has not been fully explored. In this paper we provide a complete theoretical analysis of a wide class of compression algorithms based on Wavelet Trees. We also show how to improve their asymptotic performance by introducing a novel framework, called Generalized Wavelet Trees, that aims for the best combination of binary compressors (like, Run-Length encoders) versus non-binary compressors (like, Huffman and Arithmetic encoders) and Wavelet Trees of properly-designed shapes. As a corollary, we prove high-order entropy bounds for the challenging combination of Burrows-Wheeler Transform and Wavelet Trees.