Improved hierarchical bit-vector compression in document retrieval systems
Proceedings of the 9th annual international ACM SIGIR conference on Research and development in information retrieval
Storing text retrieval systems on CD-ROM: compression and encryption considerations
ACM Transactions on Information Systems (TOIS)
Storing text retrieval systems on CD-ROM: compression and encryption considerations
SIGIR '89 Proceedings of the 12th annual international ACM SIGIR conference on Research and development in information retrieval
Posting compression in dynamic retrieval environments
SIGIR '91 Proceedings of the 14th annual international ACM SIGIR conference on Research and development in information retrieval
Parameterised compression for sparse bitmaps
SIGIR '92 Proceedings of the 15th annual international ACM SIGIR conference on Research and development in information retrieval
Compression of indexes with full positional information in very large text databases
SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient recompression techniques for dynamic full-text retrieval systems
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Self-indexing inverted files for fast text retrieval
ACM Transactions on Information Systems (TOIS)
Modeling word occurrences for the compression of concordances
ACM Transactions on Information Systems (TOIS)
Binary Interpolative Coding for Effective Index Compression
Information Retrieval
A New Compression Method for Compressed Matching
DCC '00 Proceedings of the Conference on Data Compression
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Processing queries with metrical constraints in XML-based IR systems
Journal of the American Society for Information Science and Technology
Hi-index | 0.00 |
The concordance of a full-text information retrieval system contains for every different word W of the data base, a list L(W) of “coordinates”, each of which describes the exact location of an occurrence of W in the text. The concordance should be compressed, not only for the savings in storage space, but also in order to reduce the number of I/O operations, since the file is usually kept in secondary memory. Several methods are presented, which efficiently compress concordances of large fulltext retrieval systems. The methods were tested on the concordance of the Responsa Retrieval Project and yield savings of up to 49% relative to the non-compressed file; this is a relative improvement of about 27% over the currently used prefix-omission compression technique.