Text indexing with errors

  • Authors:
  • Moritz G. Maaí;Johannes Nowak

  • Affiliations:
  • Institut für Informatik, Technische Universität München, Boltzmannstr. 3, D-85748 Garching, Germany;Institut für Informatik, Technische Universität München, Boltzmannstr. 3, D-85748 Garching, Germany

  • Venue:
  • Journal of Discrete Algorithms
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we address the problem of constructing an index for a text document or a collection of documents to answer various questions about the occurrences of a pattern when allowing a constant number of errors. In particular, our index can be built to report all occurrences, all positions, or all documents where a pattern occurs in time linear in the size of the query string and the number of results. This improves over previous work where the look-up time was either not linear or depended upon the size of the document corpus. Our data structure has size O(nlog^dn) on average and with high probability for input size n and queries with up to d errors. Additionally, we present a trade-off between query time and index complexity that achieves worst-case bounded index size and preprocessing time with linear look-up time on average.