Space-Efficient Data Structures for Flexible Text Retrieval Systems

  • Authors:
  • Kunihiko Sadakane

  • Affiliations:
  • -

  • Venue:
  • ISAAC '02 Proceedings of the 13th International Symposium on Algorithms and Computation
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose space-efficient data structures for text retrieval systems that have merits of both theoretical data structures like suffix trees and practical ones like inverted files. Traditional text retrieval systems use the inverted files and support ranking queries based on the tf*idf (term frequency times inverse document frequency) scores of documents that contain given keywords, which cannot be solved by using only the suffix trees. A drawback of the systems is that the scores can be computed for only predetermined keywords. We extend the data structure so that the scores can be computed for any pattern efficiently while keeping the size of the data structures moderate. The size is comparable with the text size, which is an improvement from existing methods using O(n log n) bit space for a text collection of length n.