When indexing equals compression: Experiments with compressing suffix arrays and applications

  • Authors:
  • Luca Foschini;Roberto Grossi;Ankur Gupta;Jeffrey Scott Vitter

  • Affiliations:
  • Scuola Superiore Sant'Anna, Pisa, Italy;Università di Pisa, Pisa, Italy;Duke University, Durham, North Carolina, NC;Purdue University, West Lafayette, Indiana, IN

  • Venue:
  • ACM Transactions on Algorithms (TALG)
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We report on a new experimental analysis of high-order entropy-compressed suffix arrays, which retains the theoretical performance of previous work and represents an improvement in practice. Our experiments indicate that the resulting text index offers state-of-the-art compression. In particular, we require roughly 20% of the original text size---without requiring a separate instance of the text. We can additionally use a simple notion to encode and decode block-sorting transforms (such as the Burrows--Wheeler transform), achieving a compression ratio comparable to that of bzip2. We also provide a compressed representation of suffix trees (and their associated text) in a total space that is comparable to that of the text alone compressed with gzip.