Range Quantile Queries: Another Virtue of Wavelet Trees
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Efficient Index for Retrieving Top-k Most Frequent Documents
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Compression, indexing, and retrieval for massive string data
CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Efficient index for retrieving top-k most frequent documents
Journal of Discrete Algorithms
Top-k ranked document search in general text databases
ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part II
Compressed self-indices supporting conjunctive queries on document collections
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
String retrieval for multi-pattern queries
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Colored range queries and document retrieval
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Practical compressed document retrieval
SEA'11 Proceedings of the 10th international conference on Experimental algorithms
Inverted indexes for phrases and strings
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Improved compressed indexes for full-text document retrieval
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Space-Efficient Preprocessing Schemes for Range Minimum Queries on Static Arrays
SIAM Journal on Computing
Top-k document retrieval in optimal time and linear space
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Top-K color queries for document retrieval
Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
New algorithms on wavelet trees and applications to information retrieval
Theoretical Computer Science
LATIN'12 Proceedings of the 10th Latin American international conference on Theoretical Informatics
Efficient in-memory top-k document retrieval
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Space-Efficient top-k document retrieval
SEA'12 Proceedings of the 11th international conference on Experimental Algorithms
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Towards an optimal space-and-query-time index for top-k document retrieval
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Document listing for queries with excluded pattern
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
New lower and upper bounds for representing sequences
ESA'12 Proceedings of the 20th Annual European conference on Algorithms
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Improved compressed indexes for full-text document retrieval
Journal of Discrete Algorithms
Colored range queries and document retrieval
Theoretical Computer Science
Spaces, Trees, and Colors: The algorithmic landscape of document retrieval on sequences
ACM Computing Surveys (CSUR)
Journal of Discrete Algorithms
Hi-index | 0.00 |
We study the Document Listing problem, where a collection D of documents d 1,...,d k of total length 驴 i d i = n is to be preprocessed, so that one can later efficiently list all the $\textrm{ndoc}$ documents containing a given query pattern P of length m as a substring. Muthukrishnan (SODA 2002) gave an optimal solution to the problem; with O(n) time preprocessing, one can answer the queries in $O(m+\textrm{ndoc})$ time. In this paper, we improve the space-requirement of the Muthukrishnan's solution from O(n logn) bits to |CSA| + 2n + nlogk (1 + o(1)) bits, where |CSA| ≤ n log|Σ|(1 + o(1)) is the size of any suitable compressed suffix array (CSA), and Σ is the underlying alphabet of documents. The time requirement depends on the CSA used, but we can obtain e.g. the optimal $O(m+\textrm{ndoc})$ time when . For general |Σ|,k the time requirement becomes $O(m \log |\Sigma|+\textrm{ndoc} \log k)$. Sadakane (ISAAC 2002) has developed a similar space-efficient variant of the Muthukrishnan's solution; we obtain a better time requirement in most cases, but a slightly worse space requirement.