Fast algorithms for finding nearest common ancestors
SIAM Journal on Computing
An algorithm for string matching with a sequence of don't cares
Information Processing Letters
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
The subtree max gap problem with application to parallel string covering
SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Suffix arrays: a new method for on-line string searches
SODA '90 Proceedings of the first annual ACM-SIAM symposium on Discrete algorithms
Two-dimensional substring indexing
PODS '01 Proceedings of the twentieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Scalable frequent-pattern mining methods: an overview
Tutorial notes of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Journal of Algorithms
Modern Information Retrieval
Color Set Size Problem with Application to String Matching
CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
Optimal suffix tree construction with large alphabets
FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
Scaling and related techniques for geometry problems
STOC '84 Proceedings of the sixteenth annual ACM symposium on Theory of computing
Space-Efficient Data Structures for Flexible Text Retrieval Systems
ISAAC '02 Proceedings of the 13th International Symposium on Algorithms and Computation
Range Searching in Categorical Data: Colored Range Searching on Grid
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
Algorithmic foundations of the internet
ACM SIGACT News
Optimal and near-optimal algorithms for generalized intersection reporting on pointer machines
Information Processing Letters
Optimizing scoring functions and indexes for proximity search in type-annotated corpora
Proceedings of the 15th international conference on World Wide Web
Succinct data structures for flexible text retrieval systems
Journal of Discrete Algorithms
Compressed indexes for approximate string matching
ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
Theoretical Computer Science
Journal of Discrete Algorithms
Faster path indexes for search in XML data
ADC '08 Proceedings of the nineteenth conference on Australasian database - Volume 75
Property matching and weighted matching
Theoretical Computer Science
Approximate colored range and point enclosure queries
Journal of Discrete Algorithms
Optimal prefix and suffix queries on texts
Information Processing Letters
SOFSEM '07 Proceedings of the 33rd conference on Current Trends in Theory and Practice of Computer Science
Range Quantile Queries: Another Virtue of Wavelet Trees
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Efficient Index for Retrieving Top-k Most Frequent Documents
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Optimal and near-optimal algorithms for generalized intersection reporting on pointer machines
Information Processing Letters
Indexing structures for approximate string matching
CIAC'03 Proceedings of the 5th Italian conference on Algorithms and complexity
Note: Fast set intersection and two-patterns matching
Theoretical Computer Science
Compression, indexing, and retrieval for massive string data
CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Efficient index for retrieving top-k most frequent documents
Journal of Discrete Algorithms
Top-k ranked document search in general text databases
ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part II
Compressed self-indices supporting conjunctive queries on document collections
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
String retrieval for multi-pattern queries
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Colored range queries and document retrieval
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Practical compressed document retrieval
SEA'11 Proceedings of the 10th international conference on Experimental algorithms
Inverted indexes for phrases and strings
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Counting colours in compressed strings
CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Improved compressed indexes for full-text document retrieval
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
A new efficient indexing algorithm for one-dimensional real scaled patterns
Journal of Computer and System Sciences
Space-Efficient Preprocessing Schemes for Range Minimum Queries on Static Arrays
SIAM Journal on Computing
Top-k document retrieval in optimal time and linear space
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Efficient non-intersection queries on aggregated geometric data
COCOON'05 Proceedings of the 11th annual international conference on Computing and Combinatorics
Optimal succinctness for range minimum queries
LATIN'10 Proceedings of the 9th Latin American conference on Theoretical Informatics
Fast set intersection and two-patterns matching
LATIN'10 Proceedings of the 9th Latin American conference on Theoretical Informatics
Approximate colored range queries
ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation
Languages with mismatches and an application to approximate indexing
DLT'05 Proceedings of the 9th international conference on Developments in Language Theory
Top-K color queries for document retrieval
Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
Search engines and web information retrieval
CAAN'04 Proceedings of the First international conference on Combinatorial and Algorithmic Aspects of Networking
New algorithms on wavelet trees and applications to information retrieval
Theoretical Computer Science
Rank-Sensitive data structures
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Space-efficient range reporting for categorical data
PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
LATIN'12 Proceedings of the 10th Latin American international conference on Theoretical Informatics
Fast relative lempel-ziv self-index for similar sequences
FAW-AAIM'12 Proceedings of the 6th international Frontiers in Algorithmics, and Proceedings of the 8th international conference on Algorithmic Aspects in Information and Management
Efficient in-memory top-k document retrieval
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Space-Efficient top-k document retrieval
SEA'12 Proceedings of the 11th international conference on Experimental Algorithms
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Towards an optimal space-and-query-time index for top-k document retrieval
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Document listing for queries with excluded pattern
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Cross-Document pattern matching
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Space-efficient algorithms for document retrieval
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Compressed data structures with relevance
Proceedings of the 21st ACM international conference on Information and knowledge management
Being picky: processing top-k queries with set-defined selections
Proceedings of the 21st ACM international conference on Information and knowledge management
A new succinct representation of RMQ-information and improvements in the enhanced suffix array
ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
Computing discriminating and generic words
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Improved compressed indexes for full-text document retrieval
Journal of Discrete Algorithms
Approximate string matching by position restricted alignment
Proceedings of the Joint EDBT/ICDT 2013 Workshops
On compressing and indexing repetitive sequences
Theoretical Computer Science
Colored range queries and document retrieval
Theoretical Computer Science
Space-efficient data structures for Top-k completion
Proceedings of the 22nd international conference on World Wide Web
Better space bounds for parameterized range majority and minority
WADS'13 Proceedings of the 13th international conference on Algorithms and Data Structures
Spaces, Trees, and Colors: The algorithmic landscape of document retrieval on sequences
ACM Computing Surveys (CSUR)
Efficient error-tolerant query autocompletion
Proceedings of the VLDB Endowment
Efficient range searching for categorical and plain data
ACM Transactions on Database Systems (TODS)
Indexing Word Sequences for Ranked Retrieval
ACM Transactions on Information Systems (TOIS)
Compact binary relation representations with rich functionality
Information and Computation
Cross-document pattern matching
Journal of Discrete Algorithms
Journal of Discrete Algorithms
Hi-index | 0.01 |
We are given a collection D of text documents d1,…,dk, with ∑i = n, which may be preprocessed. In the document listing problem, we are given an online query comprising of a pattern string p of length m and our goal is to return the set of all documents that contain one or more copies of p. In the closely related occurrence listing problem, we output the set of all positions within the documents where pattern p occurs. In 1973, Weiner [24] presented an algorithm with O(n) time and space preprocessing following which the occurrence listing problem can be solved in time O(m + output) where output is the number of positions where p occurs; this algorithm is clearly optimal. In contrast, no optimal algorithm is known for the closely related document listing problem, which is perhaps more natural and certainly well-motivated.We provide the first known optimal algorithm for the document listing problem. More generally, we initiate the study of pattern matching problems that require retrieving documents matched by the patterns; this contrasts with pattern matching problems that have been studied more frequently, namely, those that involve retrieving all occurrences of patterns. We consider document retrieval problems that are motivated by online query processing in databases, Information Retrieval systems and Computational Biology. We present very efficient (optimal) algorithms for our document retrieval problems. Our approach for solving such problems involve performing "local" encodings whereby they are reduced to range query problems on geometric objects --- points and lines --- that have color. We present improved algorithms for these colored range query problems that arise in our reductions using the structural properties of strings. This approach is quite general and yields simple, efficient, implementable algorithms for all the document retrieval problems in this paper.