Text indexing with errors

Authors:
Moritz G. Maaß;Johannes Nowak
Affiliations:
Fakultät für Informatik, Technische Universität München, Garching, Germany;Fakultät für Informatik, Technische Universität München, Garching, Germany
Venue:
CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
Year:
2005

Citing 19
Cited 10

Algorithms for approximate string matching

Information and Control
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Lower bounds for high dimensional nearest neighbor search and related problems

STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
Tighter bounds for nearest neighbor search and related problems in the cell probe model

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Text indexing and dictionary matching with one error

Journal of Algorithms
A linear lower bound on index size for text retrieval

SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Average Case Analysis of Algorithms on Sequences

Average Case Analysis of Algorithms on Sequences
Efficient algorithms for document retrieval problems

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Optimal Exact Strring Matching Based on Suffix Arrays

SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Approximate String-Matching over Suffix Trees

CPM '93 Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching
Approximate Dictionary Queries

CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
Range Searching Over Tree Cross Products

ESA '00 Proceedings of the 8th Annual European Symposium on Algorithms
A Metric Index for Approximate String Matching

LATIN '02 Proceedings of the 5th Latin American Symposium on Theoretical Informatics
Scaling and related techniques for geometry problems

STOC '84 Proceedings of the sixteenth annual ACM symposium on Theory of computing
Dictionary matching and indexing with errors and don't cares

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Indexing structures for approximate string matching

CIAC'03 Proceedings of the 5th Italian conference on Algorithms and complexity

Languages with mismatches

Theoretical Computer Science
Optimal prefix and suffix queries on texts

Information Processing Letters
Adaptive search engines as discovery games: an evolutionary approach

Proceedings of the 6th International Conference on Advances in Mobile Computing and Multimedia
Community Adaptive Search Engines

International Journal of Advanced Intelligence Paradigms
Faster and Space-Optimal Edit Distance "1" Dictionary

CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
From Nerode's congruence to suffix automata with mismatches

Theoretical Computer Science
On the suffix automaton with mismatches

CIAA'07 Proceedings of the 12th international conference on Implementation and application of automata
Fast index for approximate string matching

Journal of Discrete Algorithms
Intelligent Social Media Indexing and Sharing Using an Adaptive Indexing Search Engine

ACM Transactions on Intelligent Systems and Technology (TIST)
Efficient approximate dictionary look-up for long words over small alphabets

LATIN'06 Proceedings of the 7th Latin American conference on Theoretical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we address the problem of constructing an index for a text document or a collection of documents to answer various questions about the occurrences of a pattern when allowing a constant number of errors. In particular, our index can be built to report all occurrences, all positions, or all documents where a pattern occurs in time linear in the size of the query string and the number of results. This improves over previous work where the lookup time is not linear or depends upon the size of the document corpus. Our data structure has size $O\left(n\log^k n\right)$ on average and with high probability for input size n and queries with up to k errors. Additionally, we present a trade-off between query time and index complexity that achieves worst-case bounded index size and preprocessing time with linear lookup time on average.