Adding compression to a full-text retrieval system
Software—Practice & Experience
Improved retrieval effectiveness through impact transformation
ADC '02 Proceedings of the 13th Australasian database conference - Volume 5
Compression of inverted indexes For fast query evaluation
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval
Inverted Index Compression Using Word-Aligned Binary Codes
Information Retrieval
Structured index organizations for high-throughput text querying
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Index compression using 64-bit words
Software—Practice & Experience
Proceedings of the 18th Australasian Document Computing Symposium
Hi-index | 0.00 |
Compression of term frequency lists and very long document-id lists within an inverted file search engine are examined. Several compression schemes are compared including Elias γ and δ codes, Golomb Encoding, Variable Byte Encoding, and a class of word-based encoding schemes including Simple-9, Relative-10 and Carryover-12. It is shown that these compression methods are not well suited to compressing these kinds of lists of numbers. Of those tested, Carryover-12 is preferred because it is both effective at compression and fast at decompression. A novel technique, Sigma Encoding prior to compression, is proposed and tested. Sigma Encoding utilizes a parameterized dictionary to reduce the number of bits necessary to store an integer. This method shows an about 0.3 bit per integer improvement over Carryover-12 while costing only about 3 extra clock cycles per integer to decompress.