Algorithms for approximate string matching
Information and Control
Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Self-alignments in words and their applications
Journal of Algorithms
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Dictionary look-up with one error
Journal of Algorithms
Lower bounds for high dimensional nearest neighbor search and related problems
STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
Tighter bounds for nearest neighbor search and related problems in the cell probe model
STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Improved bounds for dictionary look-up with one error
Information Processing Letters
Text indexing and dictionary matching with one error
Journal of Algorithms
A linear lower bound on index size for text retrieval
SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Time-space tradeoffs, multiparty communication complexity, and nearest-neighbor problems
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Average Case Analysis of Algorithms on Sequences
Average Case Analysis of Algorithms on Sequences
Efficient algorithms for document retrieval problems
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Optimal Exact Strring Matching Based on Suffix Arrays
SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Approximate String-Matching over Suffix Trees
CPM '93 Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching
Approximate Dictionary Queries
CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
A Metric Index for Approximate String Matching
LATIN '02 Proceedings of the 5th Latin American Symposium on Theoretical Informatics
Optimal suffix tree construction with large alphabets
FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
Opportunistic data structures with applications
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Scaling and related techniques for geometry problems
STOC '84 Proceedings of the sixteenth annual ACM symposium on Theory of computing
Dictionary matching and indexing with errors and don't cares
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
A new method for approximate indexing and dictionarylookup with one error
Information Processing Letters
Linear pattern matching algorithms
SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Linear-time construction of suffix arrays
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Space efficient linear time construction of suffix arrays
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Indexing structures for approximate string matching
CIAC'03 Proceedings of the 5th Italian conference on Algorithms and complexity
Asymptotic properties of data compression and suffix trees
IEEE Transactions on Information Theory
Extending autocompletion to tolerate errors
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
A linear size index for approximate pattern matching
Journal of Discrete Algorithms
String indexing for patterns with wildcards
SWAT'12 Proceedings of the 13th Scandinavian conference on Algorithm Theory
Hi-index | 0.00 |
In this paper we address the problem of constructing an index for a text document or a collection of documents to answer various questions about the occurrences of a pattern when allowing a constant number of errors. In particular, our index can be built to report all occurrences, all positions, or all documents where a pattern occurs in time linear in the size of the query string and the number of results. This improves over previous work where the look-up time was either not linear or depended upon the size of the document corpus. Our data structure has size O(nlog^dn) on average and with high probability for input size n and queries with up to d errors. Additionally, we present a trade-off between query time and index complexity that achieves worst-case bounded index size and preprocessing time with linear look-up time on average.