Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Document filtering for fast ranking
SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Fast evaluation of structured queries for information retrieval
SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Query evaluation: strategies and optimizations
Information Processing and Management: an International Journal
Filtered document retrieval with frequency-sorted indexes
Journal of the American Society for Information Science
Self-indexing inverted files for fast text retrieval
ACM Transactions on Information Systems (TOIS)
Optimization of inverted vector searches
SIGIR '85 Proceedings of the 8th annual international ACM SIGIR conference on Research and development in information retrieval
Inverted files versus signature files for text indexing
ACM Transactions on Database Systems (TODS)
An analysis of the Burrows—Wheeler transform
Journal of the ACM (JACM)
Vector-space ranking with effective early termination
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Efficient algorithms for document retrieval problems
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Information Retrieval
High-order entropy-compressed text indexes
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Opportunistic data structures with applications
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Succinct static data structures
Succinct static data structures
Efficient query evaluation using a two-level retrieval process
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Binary Codes for Non-Uniform Sources
DCC '05 Proceedings of the Data Compression Conference
Optimization strategies for complex queries
Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Journal of the ACM (JACM)
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Pruned query evaluation using pre-computed impacts
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
ACM Computing Surveys (CSUR)
Succinct data structures for flexible text retrieval systems
Journal of Discrete Algorithms
A taxonomy of suffix array construction algorithms
ACM Computing Surveys (CSUR)
Space-Efficient Algorithms for Document Retrieval
CPM '07 Proceedings of the 18th annual symposium on Combinatorial Pattern Matching
Compressed text indexes: From theory to practice
Journal of Experimental Algorithmics (JEA)
The myriad virtues of Wavelet Trees
Information and Computation
Compressing term positions in web indexes
Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Range Quantile Queries: Another Virtue of Wavelet Trees
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Space-Efficient Framework for Top-k String Retrieval Problems
FOCS '09 Proceedings of the 2009 50th Annual IEEE Symposium on Foundations of Computer Science
Implicit compression boosting with applications to self-indexing
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Information Retrieval: Implementing and Evaluating Search Engines
Information Retrieval: Implementing and Evaluating Search Engines
Top-k ranked document search in general text databases
ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part II
Space-Efficient Preprocessing Schemes for Range Minimum Queries on Static Arrays
SIAM Journal on Computing
New algorithms on wavelet trees and applications to information retrieval
Theoretical Computer Science
Enhanced byte codes with restricted prefix properties
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Space-Efficient top-k document retrieval
SEA'12 Proceedings of the 11th international conference on Experimental Algorithms
Reordering an index to speed query processing without loss of effectiveness
Proceedings of the Seventeenth Australasian Document Computing Symposium
Spaces, Trees, and Colors: The algorithmic landscape of document retrieval on sequences
ACM Computing Surveys (CSUR)
Indexing Word Sequences for Ranked Retrieval
ACM Transactions on Information Systems (TOIS)
Hi-index | 0.00 |
For over forty years the dominant data structure for ranked document retrieval has been the inverted index. Inverted indexes are effective for a variety of document retrieval tasks, and particularly efficient for large data collection scenarios that require disk access and storage. However, many efficiency-bound search tasks can now easily be supported entirely in memory as a result of recent hardware advances. In this paper we present a hybrid algorithmic framework for in-memory bag of-words ranked document retrieval using a self-index derived from the FM-Index, wavelet tree, and the compressed suffix tree data structures, and evaluate the various algorithmic trade-offs for performing efficient queries entirely in-memory. We compare our approach with two classic approaches to bag-of-words queries using inverted indexes, term-at-a-time (TAAT) and document-at-a-time (DAAT) query processing. We show that our framework is competitive with state-of-the-art indexing structures, and describe new capabilities provided by our algorithms that can be leveraged by future systems to improve effectiveness and efficiency for a variety of fundamental search operations.