Succinct data structures for flexible text retrieval systems

  • Authors:
  • Kunihiko Sadakane

  • Affiliations:
  • Department of Computer Science and Communication Engineering, Kyushu University, Fukuoka, Japan

  • Venue:
  • Journal of Discrete Algorithms
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose succinct data structures for text retrieval systems supporting document listing queries and ranking queries based on the tf*idf (term frequency times inverse document frequency) scores of documents. Traditional data structures for these problems support queries only for some predetermined keywords. Recently Muthukrishnan proposed a data structure for document listing queries for arbitrary patterns at the cost of data structure size. For computing the tf*idf scores there has been no efficient data structures for arbitrary patterns. Our new data structures support these queries using small space. The space is only 2/@e times the size of compressed documents plus 10n bits for a document collection of length n, for any 0