Optimization for dynamic inverted index maintenance
SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
Supporting full-text information retrieval with a persistent object store
EDBT '94 Proceedings of the 4th international conference on extending database technology: Advances in database technology
Self-indexing inverted files for fast text retrieval
ACM Transactions on Information Systems (TOIS)
Modeling word occurrences for the compression of concordances
ACM Transactions on Information Systems (TOIS)
Interaction of query evaluation and buffer management for information retrieval
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Compressed inverted files with reduced decoding overheads
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Adaptive set intersections, unions, and differences
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Searching the Web: the public and their queries
Journal of the American Society for Information Science and Technology
Static index pruning for information retrieval systems
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Rank-preserving two-level caching for scalable search engines
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Compression and Coding Algorithms
Compression and Coding Algorithms
Compression of inverted indexes For fast query evaluation
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Binary Interpolative Coding for Effective Index Compression
Information Retrieval
Information Retrieval
Experiments on Adaptive Set Intersections for Text Retrieval Systems
ALENEX '01 Revised Papers from the Third International Workshop on Algorithm Engineering and Experimentation
Exploiting clustering in inverted file Compression
DCC '96 Proceedings of the Conference on Data Compression
Index Compression through Document Reordering
DCC '02 Proceedings of the Data Compression Conference
Indexing text using the Ziv-Lempel trie
Journal of Discrete Algorithms - SPIRE 2002
New text indexing functionalities of the compressed suffix arrays
Journal of Algorithms
Assigning identifiers to documents to enhance the clustering property of fulltext indexes
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Index compression using fixed binary codewords
ADC '04 Proceedings of the 15th Australasian database conference - Volume 27
Inverted Index Compression Using Word-Aligned Binary Codes
Information Retrieval
Competitive caching of query results in search engines
Theoretical Computer Science - Special issue: Online algorithms in memoriam, Steve Seiden
Three-level caching for efficient query processing in large Web search engines
WWW '05 Proceedings of the 14th international conference on World Wide Web
Journal of the ACM (JACM)
ACM Transactions on Information Systems (TOIS)
Super-Scalar RAM-CPU Cache Compression
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Improved Word-Aligned Binary Compression for Text Indexing
IEEE Transactions on Knowledge and Data Engineering
Inverted files for text search engines
ACM Computing Surveys (CSUR)
TSP and cluster-based solutions to the reassignment of document identifiers
Information Retrieval
Pruned query evaluation using pre-computed impacts
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Succinct suffix arrays based on run-length encoding
Nordic Journal of Computing
Data Compression: The Complete Reference
Data Compression: The Complete Reference
ACM Computing Surveys (CSUR)
Efficient in-memory extensible inverted file
Information Systems
Compressed representations of sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
A pipelined architecture for distributed text query evaluation
Information Retrieval
Fast generation of result snippets in web search
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Efficient document retrieval in main memory
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Dynamic index pruning for effective caching
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Compact data structures with fast queries
Compact data structures with fast queries
Performance of compressed inverted list caching in search engines
Proceedings of the 17th international conference on World Wide Web
Output-sensitive autocompletion search
Information Retrieval
Introduction to Information Retrieval
Introduction to Information Retrieval
Design trade-offs for search engine caching
ACM Transactions on the Web (TWEB)
Compressed Text Indexes with Fast Locate
CPM '07 Proceedings of the 18th annual symposium on Combinatorial Pattern Matching
Application-Specific Disk I/O Optimisation for a Search Engine
PDCAT '08 Proceedings of the 2008 Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies
Self-indexing Natural Language
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Out of the Box Phrase Indexing
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Inverted index compression and query processing with optimized document ordering
Proceedings of the 18th international conference on World wide web
Improved techniques for result caching in web search engines
Proceedings of the 18th international conference on World wide web
SIMD-scan: ultra fast in-memory table scan using on-chip vector processing units
Proceedings of the VLDB Endowment
Sorting out the document identifier assignment problem
ECIR'07 Proceedings of the 29th European conference on IR research
Compact set representation for information retrieval
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Faster adaptive set intersections for text searching
WEA'06 Proceedings of the 5th international conference on Experimental Algorithms
Reducing the space requirement of LZ-Index
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Inverted files versus suffix arrays for locating patterns in primary memory
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Structured index organizations for high-throughput text querying
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Experimental analysis of a fast intersection algorithm for sorted sequences
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Word-based self-indexes for natural language text
ACM Transactions on Information Systems (TOIS)
Efficient transaction processing in SAP HANA database: the end of a column store myth
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Ranked document retrieval in (almost) no space
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Improved address-calculation coding of integer arrays
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Implicit indexing of natural language text by reorganizing bytecodes
Information Retrieval
Indexing Word Sequences for Ranked Retrieval
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
Inverted index data structures are the key to fast text search engines. We first investigate one of the predominant operation on inverted indexes, which asks for intersecting two sorted lists of document IDs of different lengths. We explore compression and performance of different inverted list data structures. In particular, we present Lookup, a new data structure that allows intersection in expected time linear in the smaller list. Based on this result, we present the algorithmic core of a full text data base that allows fast Boolean queries, phrase queries, and document reporting using less space than the input text. The system uses a carefully choreographed combination of classical data compression techniques and inverted-index-based search data structures. Our experiments show that inverted indexes are preferable over purely suffix-array-based techniques for in-memory (English) text search engines. A similar system is now running in practice in each core of the distributed data base engine TREX of SAP.