Document filtering for fast ranking
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Information Processing and Management: an International Journal - Special issue: data compression
Filtered document retrieval with frequency-sorted indexes
Journal of the American Society for Information Science
Self-indexing inverted files for fast text retrieval
ACM Transactions on Information Systems (TOIS)
Compressed inverted files with reduced decoding overheads
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Fast and flexible word searching on compressed text
ACM Transactions on Information Systems (TOIS)
Searching the Web: the public and their queries
Journal of the American Society for Information Science and Technology
Vector-space ranking with effective early termination
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Adding Compression to Block Addressing Inverted Indexes
Information Retrieval
Indexing for fast categorisation
ACSC '03 Proceedings of the 26th Australasian computer science conference - Volume 16
Efficient single-pass index construction for text databases
Journal of the American Society for Information Science and Technology
Index construction for linear categorisation
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Operational requirements for scalable search systems
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Assigning document identifiers to enhance compressibility of Web Search Engines indexes
Proceedings of the 2004 ACM symposium on Applied computing
ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
In-place versus re-build versus re-merge: index maintenance strategies for text retrieval systems
ACSC '04 Proceedings of the 27th Australasian conference on Computer science - Volume 26
Assigning identifiers to documents to enhance the clustering property of fulltext indexes
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Index compression using fixed binary codewords
ADC '04 Proceedings of the 15th Australasian database conference - Volume 27
Inverted Index Compression Using Word-Aligned Binary Codes
Information Retrieval
Three-level caching for efficient query processing in large Web search engines
WWW '05 Proceedings of the 14th international conference on World Wide Web
Fossilized index: the linchpin of trustworthy non-alterable electronic records
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
n-gram/2L: a space and time efficient two-level n-gram inverted index structure
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Fast on-line index construction by geometric partitioning
Proceedings of the 14th ACM international conference on Information and knowledge management
Information Processing and Management: an International Journal
Improved Word-Aligned Binary Compression for Text Indexing
IEEE Transactions on Knowledge and Data Engineering
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Efficient online index maintenance for contiguous inverted lists
Information Processing and Management: an International Journal
Efficient query processing in geographic web search engines
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Hybrid index maintenance for growing text collections
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Accelerating sparse matrix computations via data compression
Proceedings of the 20th annual international conference on Supercomputing
A document-centric approach to static index pruning in text retrieval systems
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
A combination of trie-trees and inverted files for the indexing of set-valued attributes
CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Efficient query expansion with auxiliary data structures
Information Systems
Compression techniques for fast external sorting
The VLDB Journal — The International Journal on Very Large Data Bases
Efficient search in large textual collections with redundancy
Proceedings of the 16th international conference on World Wide Web
A pipelined architecture for distributed text query evaluation
Information Retrieval
Index compression is good, especially for random access
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Hybrid index maintenance for contiguous inverted lists
Information Retrieval
Performance of compressed inverted list caching in search engines
Proceedings of the 17th international conference on World Wide Web
Efficient online index construction for text databases
ACM Transactions on Database Systems (TODS)
Proximity Scoring Using Sentence-Based Inverted Index for Practical Full-Text Search
ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
Fast Bit Gather, Bit Scatter and Bit Permutation Instructions for Commodity Microprocessors
Journal of Signal Processing Systems
Structural optimization of a full-text n-gram index using relational normalization
The VLDB Journal — The International Journal on Very Large Data Bases
RDF-3X: a RISC-style engine for RDF
Proceedings of the VLDB Endowment
Inverted index compression and query processing with optimized document ordering
Proceedings of the 18th international conference on World wide web
RLH: Bitmap compression technique based on run-length and Huffman encoding
Information Systems
Compressing term positions in web indexes
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Semplore: A scalable IR approach to search the Web of Data
Web Semantics: Science, Services and Agents on the World Wide Web
Inverted indexes vs. bitmap indexes in decision support systems
Proceedings of the 18th ACM conference on Information and knowledge management
Information Processing and Management: an International Journal
Index compression using 64-bit words
Software—Practice & Experience
The RDF-3X engine for scalable management of RDF data
The VLDB Journal — The International Journal on Very Large Data Bases
External sorting with on-the-fly compression
BNCOD'03 Proceedings of the 20th British national conference on Databases
Sorting out the document identifier assignment problem
ECIR'07 Proceedings of the 29th European conference on IR research
Scalable techniques for document identifier assignment in inverted indexes
Proceedings of the 19th international conference on World wide web
Semplore: an IR approach to scalable hybrid query of semantic web data
ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Scalable online index construction with multi-core CPUs
ADC '10 Proceedings of the Twenty-First Australasian Conference on Database Technologies - Volume 104
Search in social networks with access control
Proceedings of the 2nd International Workshop on Keyword Search on Structured Data
Engineering basic algorithms of an in-memory text search engine
ACM Transactions on Information Systems (TOIS)
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Efficient answering of set containment queries for skewed item distributions
Proceedings of the 14th International Conference on Extending Database Technology
Indexing methods for approximate dictionary searching: Comparative analysis
Journal of Experimental Algorithmics (JEA)
Reordering columns for smaller indexes
Information Sciences: an International Journal
A novel hash-based streaming scheme for energy efficient full-text search in wireless data broadcast
DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Proceedings of the VLDB Endowment
Interpolative coding of integer sequences supporting log-time random access
Information Processing and Management: an International Journal
Factorization-based lossless compression of inverted indices
Proceedings of the 20th ACM international conference on Information and knowledge management
Workload-aware indexing for keyword search in social networks
Proceedings of the 20th ACM international conference on Information and knowledge management
Relative Lempel-Ziv factorization for efficient storage and retrieval of web collections
Proceedings of the VLDB Endowment
Efficient query evaluation through access-reordering
AIRS'06 Proceedings of the Third Asia conference on Information Retrieval Technology
Indexing shared content in information retrieval systems
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Enhanced byte codes with restricted prefix properties
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Reordering rows for better compression: Beyond the lexicographic order
ACM Transactions on Database Systems (TODS)
Research on new algorithm of topic-oriented crawler and duplicated web pages detection
ICIC'12 Proceedings of the 8th international conference on Intelligent Computing Theories and Applications
Efficient indexing algorithms for approximate pattern matching in text
Proceedings of the Seventeenth Australasian Document Computing Symposium
Reordering an index to speed query processing without loss of effectiveness
Proceedings of the Seventeenth Australasian Document Computing Symposium
Dual-Sorted inverted lists in practice
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Comparing Different Sparse Matrix Storage Structures as Index Structure for Arabic Text Collection
International Journal of Information Retrieval Research
Efficient fuzzy search in large text collections
ACM Transactions on Information Systems (TOIS)
Capturing programming content in online discussions
Proceedings of the seventh international conference on Knowledge capture
The impact of solid state drive on search engine cache management
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Faster and smaller inverted indices with treaps
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Re-Ordered FEGC and Block Based FEGC for Inverted File Compression
International Journal of Information Retrieval Research
Document vector representations for feature extraction in multi-stage document ranking
Information Retrieval
Hi-index | 0.01 |
Compression reduces both the size of indexes and the time needed to evaluate queries. In this paper, we revisit the compression of inverted lists of document postings that store the position and frequency of indexed terms, considering two approaches to improving retrieval efficiency: better implementation and better choice of integer compression schemes. First, we propose several simple optimisations to well-known integer compression schemes, and show experimentally that these lead to significant reductions in time. Second, we explore the impact of choice of compression scheme on retrieval efficiency.In experiments on large collections of data, we show two surprising results: use of simple byte-aligned codes halves the query evaluation time compared to the most compact Golomb-Rice bitwise compression schemes; and, even when an index fits entirely in memory, byte-aligned codes result in faster query evaluation than does an uncompressed index, emphasising that the cost of transferring data from memory to the CPU cache is less for an appropriately compressed index than for an uncompressed index. Moreover, byte-aligned schemes have only a modest space overhead: the most compact schemes result in indexes that are around 10% of the size of the collection, while a byte-aligned scheme is around 13%. We conclude that fast byte-aligned codes should be used to store integers in inverted lists.