Approximate matching in the L1 metric

  • Authors:
  • Amihood Amir;Ohad Lipsky;Ely Porat;Julia Umanski

  • Affiliations:
  • Department of Computer Science, Bar-Ilan University,and Georgia Tech, Ramat-Gan, Israel;Department of Computer Science, Bar-Ilan University, Ramat-Gan, Israel;Department of Computer Science, Bar-Ilan University, Ramat-Gan, Israel;Department of Computer Science, Bar-Ilan University, Ramat-Gan, Israel

  • Venue:
  • CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Approximate matching is one of the fundamental problems in pattern matching, and a ubiquitous problem in real applications. The Hamming distance is a simple and well studied example of approximate matching, motivated by typing, or noisy channels. Biological and image processing applications assign a different value to mismatches of different symbols. We consider the problem of approximate matching in the L1 metric – the k-L1-distance problem. Given text T=t0,...,tn−1 and pattern P=p0,...,pm−1 strings of natural number, and a natural number k, we seek all text locations i where the L1 distance of the pattern from the length m substring of text starting at i is not greater than k, i.e. $\sum_{j=0}^{m-1} |{t}_{i+j} - {p}_{j}| \leq k$. We provide an algorithm that solves the k-L1-distance problem in time $O(n\sqrt{k\log k})$. The algorithm applies a bounded divide-and-conquer approach and makes novel uses of non-boolean convolutions.