Index compression using fixed binary codewords
ADC '04 Proceedings of the 15th Australasian database conference - Volume 27
Inverted Index Compression Using Word-Aligned Binary Codes
Information Retrieval
Information Processing and Management: an International Journal
Improved Word-Aligned Binary Compression for Text Indexing
IEEE Transactions on Knowledge and Data Engineering
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Accelerating sparse matrix computations via data compression
Proceedings of the 20th annual international conference on Supercomputing
Efficient plagiarism detection for large code repositories
Software—Practice & Experience
Efficient in-memory extensible inverted file
Information Systems
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Efficient index compression in DB2 LUW
Proceedings of the VLDB Endowment
Information Processing and Management: an International Journal
Index compression using 64-bit words
Software—Practice & Experience
Sorting out the document identifier assignment problem
ECIR'07 Proceedings of the 29th European conference on IR research
VSEncoding: efficient coding and fast decoding of integer lists via dynamic programming
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Engineering basic algorithms of an in-memory text search engine
ACM Transactions on Information Systems (TOIS)
Fast and effective focused retrieval
INEX'09 Proceedings of the Focused retrieval and evaluation, and 8th international conference on Initiative for the evaluation of XML retrieval
Within-document term-based index pruning with statistical hypothesis testing
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
University of Otago at INEX 2010
INEX'10 Proceedings of the 9th international conference on Initiative for the evaluation of XML retrieval: comparative evaluation of focused retrieval
An impact ordering approach for indexing fuzzy sets
Fuzzy Sets and Systems
Efficient in-memory top-k document retrieval
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Proceedings of the 18th Australasian Document Computing Symposium
Re-Ordered FEGC and Block Based FEGC for Inverted File Compression
International Journal of Information Retrieval Research
Hi-index | 0.00 |
Research into inverted file compression has focused on compression ratio—how small the indexes can be. Compression ratio is important for fast interactive searching. It is taken as read, the smaller the index, the faster the search.The premise “smaller is better” may not be true. To truly build faster indexes it is often necessary to forfeit compression. For inverted lists consisting of only 128 occurrences compression may only add overhead. Perhaps the inverted list could be stored in 128 bytes in place of 128 words, but it must still be stored on disk. If the minimum disk sector read size is 512 bytes and the word size is 4 bytes, then both the compressed and raw postings would require one disk seek and one disk sector read. A less efficient compression technique may increase the file size, but decrease load/decompress time, thereby increasing throughput.Examined here are five compression techniques, Golomb, Elias gamma, Elias delta, Variable Byte Encoding and Binary Interpolative Coding. The effect on file size, file seek time, and file read time are all measured as is decompression time. A quantitative measure of throughput is developed and the performance of each method is determined.