When indexing equals compression: experiments with compressing suffix arrays and applications

  • Authors:
  • Roberto Grossi;Ankur Gupta;Jeffrey Scott Vitter

  • Affiliations:
  • Università di Pisa, Pisa;Duke University, Durham, NC;Purdue University, West Lafayette, IN

  • Venue:
  • SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We report on a new and improved version of high-order entropy-compressed suffix arrays, which has theoretical performance guarantees similar to those in our earlier work [16], yet represents an improvement in practice. Our experiments indicate that the resulting text index offers state-of-the-art compression. In particular, we require roughly 20% of the original text size---without requiring a separate instance of the text---and support fast and powerful searches. To our knowledge, this is the best known method in terms of space for fast searching.