Text indexing with errors

Authors:
Moritz G. Maaí;Johannes Nowak
Affiliations:
Institut für Informatik, Technische Universität München, Boltzmannstr. 3, D-85748 Garching, Germany;Institut für Informatik, Technische Universität München, Boltzmannstr. 3, D-85748 Garching, Germany
Venue:
Journal of Discrete Algorithms
Year:
2007

Citing 30
Cited 3

Algorithms for approximate string matching

Information and Control
Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
Self-alignments in words and their applications

Journal of Algorithms
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Dictionary look-up with one error

Journal of Algorithms
Lower bounds for high dimensional nearest neighbor search and related problems

STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
Tighter bounds for nearest neighbor search and related problems in the cell probe model

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract)

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Improved bounds for dictionary look-up with one error

Information Processing Letters
Text indexing and dictionary matching with one error

Journal of Algorithms
A linear lower bound on index size for text retrieval

SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Time-space tradeoffs, multiparty communication complexity, and nearest-neighbor problems

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Average Case Analysis of Algorithms on Sequences

Average Case Analysis of Algorithms on Sequences
Efficient algorithms for document retrieval problems

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Optimal Exact Strring Matching Based on Suffix Arrays

SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Approximate String-Matching over Suffix Trees

CPM '93 Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching
Approximate Dictionary Queries

CPM '96 Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching
A Metric Index for Approximate String Matching

LATIN '02 Proceedings of the 5th Latin American Symposium on Theoretical Informatics
Optimal suffix tree construction with large alphabets

FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
Opportunistic data structures with applications

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Scaling and related techniques for geometry problems

STOC '84 Proceedings of the sixteenth annual ACM symposium on Theory of computing
Dictionary matching and indexing with errors and don't cares

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
A new method for approximate indexing and dictionarylookup with one error

Information Processing Letters
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Linear-time construction of suffix arrays

CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Space efficient linear time construction of suffix arrays

CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Indexing structures for approximate string matching

CIAC'03 Proceedings of the 5th Italian conference on Algorithms and complexity
Asymptotic properties of data compression and suffix trees

IEEE Transactions on Information Theory

Extending autocompletion to tolerate errors

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
A linear size index for approximate pattern matching

Journal of Discrete Algorithms
String indexing for patterns with wildcards

SWAT'12 Proceedings of the 13th Scandinavian conference on Algorithm Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we address the problem of constructing an index for a text document or a collection of documents to answer various questions about the occurrences of a pattern when allowing a constant number of errors. In particular, our index can be built to report all occurrences, all positions, or all documents where a pattern occurs in time linear in the size of the query string and the number of results. This improves over previous work where the look-up time was either not linear or depended upon the size of the document corpus. Our data structure has size O(nlog^dn) on average and with high probability for input size n and queries with up to d errors. Additionally, we present a trade-off between query time and index complexity that achieves worst-case bounded index size and preprocessing time with linear look-up time on average.