Stemming algorithms: a case study for detailed evaluation
Journal of the American Society for Information Science - Special issue: evaluation of information retrieval systems
Filtered document retrieval with frequency-sorted indexes
Journal of the American Society for Information Science
Corpus-based stemming using cooccurrence of word variants
ACM Transactions on Information Systems (TOIS)
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Adaptive set intersections, unions, and differences
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Vector-space ranking with effective early termination
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval: Computational and Theoretical Aspects
Information Retrieval: Computational and Theoretical Aspects
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Adaptive intersection and t-threshold problems
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Modern Information Retrieval
Compression of inverted indexes For fast query evaluation
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Adding Compression to Block Addressing Inverted Indexes
Information Retrieval
High-order entropy-compressed text indexes
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Searching large text collections
Handbook of massive data sets
Inverted Index Compression Using Word-Aligned Binary Codes
Information Retrieval
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Pruned query evaluation using pre-computed impacts
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
ACM Computing Surveys (CSUR)
Efficient document retrieval in main memory
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to Information Retrieval
Introduction to Information Retrieval
Search Engines: Information Retrieval in Practice
Search Engines: Information Retrieval in Practice
Inverted index compression and query processing with optimized document ordering
Proceedings of the 18th international conference on World wide web
An experimental investigation of set intersection algorithms for text searching
Journal of Experimental Algorithmics (JEA)
Range Quantile Queries: Another Virtue of Wavelet Trees
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Space-Efficient Framework for Top-k String Retrieval Problems
FOCS '09 Proceedings of the 2009 50th Annual IEEE Symposium on Foundations of Computer Science
Compact set representation for information retrieval
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Information Retrieval: Implementing and Evaluating Search Engines
Information Retrieval: Implementing and Evaluating Search Engines
Top-k ranked document search in general text databases
ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part II
Experimental analysis of a fast intersection algorithm for sorted sequences
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Inverted indexes for phrases and strings
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Top-k document retrieval in optimal time and linear space
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
New algorithms on wavelet trees and applications to information retrieval
Theoretical Computer Science
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Towards an optimal space-and-query-time index for top-k document retrieval
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Document listing for queries with excluded pattern
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Dual-Sorted inverted lists in practice
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Faster and smaller inverted indices with treaps
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Journal of Discrete Algorithms
Hi-index | 0.00 |
Several IR tasks rely, to achieve high efficiency, on a single pervasive data structure called the inverted index. This is a mapping from the terms in a text collection to the documents where they appear, plus some supplementary data. Different orderings in the list of documents associated to a term, and different supplementary data, fit widely different IR tasks. Index designers have to choose the right order for one such task, rendering the index difficult to use for others. In this paper we introduce a general technique, based on wavelet trees, to maintain a single data structure that offers the combined functionality of two independent orderings for an inverted index, with competitive efficiency and within the space of one compressed inverted index. We show in particular that the technique allows combining an ordering by decreasing term frequency (useful for ranked document retrieval) with an ordering by increasing document identifier (useful for phrase and Boolean queries). We show that we can support not only the primitives required by the different search paradigms (e.g., in order to implement any intersection algorithm on top of our data structure), but also that the data structure offers novel ways of carrying out many operations of interest, including space-free treatment of stemming and hierarchical documents.