A parallel algorithm for fixed-length approximate string-matching with k-mismatches

  • Authors:
  • Maxime Crochemore;Costas S. Iliopoulos;Solon P. Pissis

  • Affiliations:
  • Dept. of Computer Science, King’s College London, London, UK;Dept. of Computer Science, King’s College London, London, UK;Dept. of Computer Science, King’s College London, London, UK

  • Venue:
  • Algorithms and Applications
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper deals with the approximate string-matching problem with Hamming distance. The approximate string-matching with k-mismatches problem is to find all locations at which a query of length m matches a factor of a text of length n with k or fewer mismatches. The approximate string-matching algorithms have both pleasing theoretical features, as well as direct applications, especially in computational biology. We consider a generalisation of this problem, the fixed-length approximate string-matching with k-mismatches problem: given a text t, a pattern x and an integer ℓ, search for all the occurrences in t of all factors of x of length ℓ with k or fewer mismatches with a factor of t. We present a practical parallel algorithm of comparable simplicity that requires only time, where w is the word size of the machine (e.g. 32 or 64 in practice) and p the number of processors. Thus the algorithm’s performance is independent of k and the alphabet size |Σ|. The proposed parallel algorithm makes use of message-passing parallelism model, and word-level parallelism for efficient approximate string-matching.