Computing the Threshold for q-Gram Filters

  • Authors:
  • Juha Kärkkäinen

  • Affiliations:
  • -

  • Venue:
  • SWAT '02 Proceedings of the 8th Scandinavian Workshop on Algorithm Theory
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

A popular and much studied class of filters for approximate string matching is based on finding common q-grams, substrings of length q, between the pattern and the text. A variation of the basic idea uses gapped q-grams and has been recently shown to provide significant improvements in practice. A major difficulty with gapped q-gram filters is the computation of the so-called threshold which defines the filter criterium. We describe the first general method for computing the threshold for q-gram filters. The method is based on a carefully chosen precise statement of the problem which is then transformed into a constrained shortest path problem. In its generic form the method leaves certain parts open but is applicable to a large variety of q-gram filters and may be extensible even to other classes of filters. We also give a full algorithm for a specific subclass. For this subclass, the algorithm has been implemented and used succesfully in an experimental comparison.