Compressed Text Databases with Efficient Query Algorithms Based on the Compressed Suffix Array

  • Authors:
  • Kunihiko Sadakane

  • Affiliations:
  • -

  • Venue:
  • ISAAC '00 Proceedings of the 11th International Conference on Algorithms and Computation
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

A compressed text database based on the compressed suffix array is proposed. The compressed suffix array of Grossi and Vitter occupies only O(n) bits for a text of length n; however it also uses the text itself that occupies O(n log |Σ|) bits for the alphabet Σ. On the other hand, our data structure does not use the text itself, and supports important operations for text databases: inverse, search and decompress. Our algorithms can find occ occurrences of any substring P of the text in O(|P| log n + occ logƐ n) time and decompress a part of the text of length l in O(l + logƐ n) time for any given 1 ≥ Ɛ 0. Our data structure occupies only n(2/Ɛ (3/2 + H0 + 2 log H0) + 2 + 4 logƐ n/logƐ n-1)+o(n)+O(|Σ| log |Σ|) bits where H0 ≤ log |Σ| is the order-0 entropy of the text. We also show the relationship with the opportunistic data structure of Ferragina and Manzini.