Algorithms for approximate string matching
Information and Control
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Lower bounds for high dimensional nearest neighbor search and related problems
STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
Tighter bounds for nearest neighbor search and related problems in the cell probe model
STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Text indexing and dictionary matching with one error
Journal of Algorithms
A linear lower bound on index size for text retrieval
SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Average Case Analysis of Algorithms on Sequences
Average Case Analysis of Algorithms on Sequences
Efficient algorithms for document retrieval problems
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Optimal Exact Strring Matching Based on Suffix Arrays
SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Approximate String-Matching over Suffix Trees
CPM '93 Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching
Approximate Dictionary Queries
CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
Range Searching Over Tree Cross Products
ESA '00 Proceedings of the 8th Annual European Symposium on Algorithms
A Metric Index for Approximate String Matching
LATIN '02 Proceedings of the 5th Latin American Symposium on Theoretical Informatics
Scaling and related techniques for geometry problems
STOC '84 Proceedings of the sixteenth annual ACM symposium on Theory of computing
Dictionary matching and indexing with errors and don't cares
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Linear pattern matching algorithms
SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Indexing structures for approximate string matching
CIAC'03 Proceedings of the 5th Italian conference on Algorithms and complexity
Theoretical Computer Science
Optimal prefix and suffix queries on texts
Information Processing Letters
Adaptive search engines as discovery games: an evolutionary approach
Proceedings of the 6th International Conference on Advances in Mobile Computing and Multimedia
Community Adaptive Search Engines
International Journal of Advanced Intelligence Paradigms
Faster and Space-Optimal Edit Distance "1" Dictionary
CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
From Nerode's congruence to suffix automata with mismatches
Theoretical Computer Science
On the suffix automaton with mismatches
CIAA'07 Proceedings of the 12th international conference on Implementation and application of automata
Fast index for approximate string matching
Journal of Discrete Algorithms
Intelligent Social Media Indexing and Sharing Using an Adaptive Indexing Search Engine
ACM Transactions on Intelligent Systems and Technology (TIST)
Efficient approximate dictionary look-up for long words over small alphabets
LATIN'06 Proceedings of the 7th Latin American conference on Theoretical Informatics
Hi-index | 0.00 |
In this paper we address the problem of constructing an index for a text document or a collection of documents to answer various questions about the occurrences of a pattern when allowing a constant number of errors. In particular, our index can be built to report all occurrences, all positions, or all documents where a pattern occurs in time linear in the size of the query string and the number of results. This improves over previous work where the lookup time is not linear or depends upon the size of the document corpus. Our data structure has size $O\left(n\log^k n\right)$ on average and with high probability for input size n and queries with up to k errors. Additionally, we present a trade-off between query time and index complexity that achieves worst-case bounded index size and preprocessing time with linear lookup time on average.