Compression of concordances in full-text retrieval systems

Authors:
Y. Choueka;A. S. Fraenkel;S. T. Klein
Affiliations:
Dept. of Math. and Computer Science, Bar-Ilan University, Ramat Gan, Israel;Dept. of Appl. Math. and Comp. Sc., Weizmann Institute of Science, Rehovot, Israel;Graduate Library School and Comp. Sc. Dept., University of Chicago, IL
Venue:
SIGIR '88 Proceedings of the 11th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
1988

Citing 1
Cited 12

Improved hierarchical bit-vector compression in document retrieval systems

Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval

Storing text retrieval systems on CD-ROM: compression and encryption considerations

ACM Transactions on Information Systems (TOIS)
Storing text retrieval systems on CD-ROM: compression and encryption considerations

SIGIR '89 Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval
Posting compression in dynamic retrieval environments

SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Parameterised compression for sparse bitmaps

SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Compression of indexes with full positional information in very large text databases

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient recompression techniques for dynamic full-text retrieval systems

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Self-indexing inverted files for fast text retrieval

ACM Transactions on Information Systems (TOIS)
Modeling word occurrences for the compression of concordances

ACM Transactions on Information Systems (TOIS)
Binary Interpolative Coding for Effective Index Compression

Information Retrieval
A New Compression Method for Compressed Matching

DCC '00 Proceedings of the Conference on Data Compression
Inverted files for text search engines

ACM Computing Surveys (CSUR)
Processing queries with metrical constraints in XML-based IR systems

Journal of the American Society for Information Science and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

The concordance of a full-text information retrieval system contains for every different word W of the data base, a list L(W) of “coordinates”, each of which describes the exact location of an occurrence of W in the text. The concordance should be compressed, not only for the savings in storage space, but also in order to reduce the number of I/O operations, since the file is usually kept in secondary memory. Several methods are presented, which efficiently compress concordances of large fulltext retrieval systems. The methods were tested on the concordance of the Responsa Retrieval Project and yield savings of up to 49% relative to the non-compressed file; this is a relative improvement of about 27% over the currently used prefix-omission compression technique.