Pattern matching in the Hamming distance with thresholds

  • Authors:
  • Mikhail J. Atallah;Timothy W. Duket

  • Affiliations:
  • Department of Computer Science, Purdue University, United States;Department of Computer Science, Purdue University, United States

  • Venue:
  • Information Processing Letters
  • Year:
  • 2011

Quantified Score

Hi-index 0.89

Visualization

Abstract

It has long been known that pattern matching in the Hamming distance metric can be done in O(min(|@S|,m/logm)nlogm) time, where n is the length of the text, m is the length of the pattern, and @S is the alphabet. The classic algorithm for this is due to Abrahamson and Kosaraju. This paper considers the following generalization, motivated by the situation where the entries in the text and pattern are analog, or distorted by additive noise, or imprecisely given for some other reason: in any alignment of the pattern with the text, two aligned symbols a and b contribute +1 to the similarity score if they differ by no more than a given threshold @q, otherwise they contribute zero. We give an O(min(|@S|,m/logm)nlogm) time algorithm for this more general version of the problem; the classic Hamming distance matching problem is the special case of @q=0.