Compression of concordances in full-text retrieval systems

  • Authors:
  • Y. Choueka;A. S. Fraenkel;S. T. Klein

  • Affiliations:
  • Dept. of Math. and Computer Science, Bar-Ilan University, Ramat Gan, Israel;Dept. of Appl. Math. and Comp. Sc., Weizmann Institute of Science, Rehovot, Israel;Graduate Library School and Comp. Sc. Dept., University of Chicago, IL

  • Venue:
  • SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 1988

Quantified Score

Hi-index 0.00

Visualization

Abstract

The concordance of a full-text information retrieval system contains for every different word W of the data base, a list L(W) of “coordinates”, each of which describes the exact location of an occurrence of W in the text. The concordance should be compressed, not only for the savings in storage space, but also in order to reduce the number of I/O operations, since the file is usually kept in secondary memory. Several methods are presented, which efficiently compress concordances of large fulltext retrieval systems. The methods were tested on the concordance of the Responsa Retrieval Project and yield savings of up to 49% relative to the non-compressed file; this is a relative improvement of about 27% over the currently used prefix-omission compression technique.