Top-k document retrieval in optimal time and linear space

Authors:
Gonzalo Navarro;Yakov Nekrich
Affiliations:
University of Chile;University of Chile
Venue:
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Year:
2012

Citing 37
Cited 12

Complete inverted files for efficient text retrieval and analysis

Journal of the ACM (JACM)
Functional approach to data structures and its use in multidimensional searching

SIAM Journal on Computing
Filtered document retrieval with frequency-sorted indexes

Journal of the American Society for Information Science
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Efficient algorithms for document retrieval problems

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
High-order entropy-compressed text indexes

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
The LCA Problem Revisited

LATIN '00 Proceedings of the 4th Latin American Symposium on Theoretical Informatics
Tables

Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science
Probabilistic Analysis of Generalized Suffix Trees (Extended Abstract)

CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
Augmenting Suffix Trees, with Applications

ESA '98 Proceedings of the 6th Annual European Symposium on Algorithms
Scaling and related techniques for geometry problems

STOC '84 Proceedings of the sixteenth annual ACM symposium on Theory of computing
Inverted files for text search engines

ACM Computing Surveys (CSUR)
Pruned query evaluation using pre-computed impacts

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Succinct data structures for flexible text retrieval systems

Journal of Discrete Algorithms
Dynamic ordered sets with exponential search trees

Journal of the ACM (JACM)
Compressed representations of sequences and full-text indexes

ACM Transactions on Algorithms (TALG)
Rank and select revisited and extended

Theoretical Computer Science
Compressed Suffix Trees with Full Functionality

Theory of Computing Systems
Space-Efficient Algorithms for Document Retrieval

CPM '07 Proceedings of the 18th annual symposium on Combinatorial Pattern Matching
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Range Quantile Queries: Another Virtue of Wavelet Trees

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Online Sorted Range Reporting

ISAAC '09 Proceedings of the 20th International Symposium on Algorithms and Computation
Space-Efficient Framework for Top-k String Retrieval Problems

FOCS '09 Proceedings of the 2009 50th Annual IEEE Symposium on Foundations of Computer Science
Modern Information Retrieval

Modern Information Retrieval
Efficient index for retrieving top-k most frequent documents

Journal of Discrete Algorithms
Top-k ranked document search in general text databases

ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part II
String retrieval for multi-pattern queries

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Colored range queries and document retrieval

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Dual-sorted inverted lists

SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Orthogonal range searching on the RAM, revisited

Proceedings of the twenty-seventh annual symposium on Computational geometry
Practical compressed document retrieval

SEA'11 Proceedings of the 10th international conference on Experimental algorithms
Inverted indexes for phrases and strings

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Alphabet-independent compressed text indexing

ESA'11 Proceedings of the 19th European conference on Algorithms
Improved compressed indexes for full-text document retrieval

SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Optimal succinctness for range minimum queries

LATIN'10 Proceedings of the 9th Latin American conference on Theoretical Informatics
Top-K color queries for document retrieval

Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Space-efficient data-analysis queries on grids

ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation

Space-efficient range reporting for categorical data

PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
Forbidden patterns

LATIN'12 Proceedings of the 10th Latin American international conference on Theoretical Informatics
Space-Efficient top-k document retrieval

SEA'12 Proceedings of the 11th international conference on Experimental Algorithms
Towards an optimal space-and-query-time index for top-k document retrieval

CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Document listing for queries with excluded pattern

CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Sorted range reporting

SWAT'12 Proceedings of the 13th Scandinavian conference on Algorithm Theory
Compressed data structures with relevance

Proceedings of the 21st ACM international conference on Information and knowledge management
Colored range queries and document retrieval

Theoretical Computer Science
Space-efficient data-analysis queries on grids

Theoretical Computer Science
Full-Fledged real-time indexing for constant size alphabets

ICALP'13 Proceedings of the 40th international conference on Automata, Languages, and Programming - Volume Part I
Spaces, Trees, and Colors: The algorithmic landscape of document retrieval on sequences

ACM Computing Surveys (CSUR)
On reporting the L1 metric closest pair in a query rectangle

Information Processing Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

We describe a data structure that uses O(n)-word space and reports k most relevant documents that contain a query pattern P in optimal O(|P| + k) time. Our construction supports an ample set of important relevance measures, such as the frequency of P in a document and the minimal distance between two occurrences of P in a document. We show how to reduce the space of the data structure from O(n log n) to O(n (log σ + log D + log log n)) bits, where σ is the alphabet size and D is the total number of documents.