Adding compression to a full-text retrieval system
Software—Practice & Experience
Self-indexing inverted files for fast text retrieval
ACM Transactions on Information Systems (TOIS)
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Compression of inverted indexes For fast query evaluation
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Binary Interpolative Coding for Effective Index Compression
Information Retrieval
Information Retrieval
Inverted Index Compression Using Word-Aligned Binary Codes
Information Retrieval
Optimization strategies for complex queries
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Super-Scalar RAM-CPU Cache Compression
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Improved Word-Aligned Binary Compression for Text Indexing
IEEE Transactions on Knowledge and Data Engineering
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Efficient document retrieval in main memory
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Index compression is good, especially for random access
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Performance of compressed inverted list caching in search engines
Proceedings of the 17th international conference on World Wide Web
Introduction to Information Retrieval
Introduction to Information Retrieval
Information Processing and Management: an International Journal
Compact set representation for information retrieval
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Faster adaptive set intersections for text searching
WEA'06 Proceedings of the 5th international conference on Experimental Algorithms
SkipBlock: self-indexing for block-based inverted list
ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
Searching web data: An entity retrieval and high-performance indexing model
Web Semantics: Science, Services and Agents on the World Wide Web
Foundations and Trends in Databases
Reordering rows for better compression: Beyond the lexicographic order
ACM Transactions on Database Systems (TODS)
Lossless asymmetric single instruction multiple data codec
Software—Practice & Experience
An index for efficient semantic full-text search
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Re-Ordered FEGC and Block Based FEGC for Inverted File Compression
International Journal of Information Retrieval Research
Hi-index | 0.00 |
Modern computers typically make use of 64-bit words as the fundamental unit of data access. However the decade-long migration from 32-bit architectures has not been reflected in compression technology, because of a widespread assumption that effective compression techniques operate in terms of bits or bytes, rather than words. Here we demonstrate that the use of 64-bit access units, especially in connection with word-bounded codes, does indeed provide the opportunity for improving the compression performance. In particular, we extend several 32-bit word-bounded coding schemes to 64-bit operation and explore their uses in information retrieval applications. Our results show that the Simple-8b approach, a 64-bit word-bounded code, is an excellent self-skipping code, and has a clear advantage over its competitors in supporting fast query evaluation when the data being compressed represents the inverted index for a large text collection. The advantages of the new code also accrue on 32-bit architectures, and for all of Boolean, ranked, and phrase queries; which means that it can be used in any situation. Copyright © 2010 John Wiley & Sons, Ltd.