Text indexing with errors

  • Authors:
  • Moritz G. Maaß;Johannes Nowak

  • Affiliations:
  • Fakultät für Informatik, Technische Universität München, Garching, Germany;Fakultät für Informatik, Technische Universität München, Garching, Germany

  • Venue:
  • CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we address the problem of constructing an index for a text document or a collection of documents to answer various questions about the occurrences of a pattern when allowing a constant number of errors. In particular, our index can be built to report all occurrences, all positions, or all documents where a pattern occurs in time linear in the size of the query string and the number of results. This improves over previous work where the lookup time is not linear or depends upon the size of the document corpus. Our data structure has size $O\left(n\log^k n\right)$ on average and with high probability for input size n and queries with up to k errors. Additionally, we present a trade-off between query time and index complexity that achieves worst-case bounded index size and preprocessing time with linear lookup time on average.