Approximate string matching with swap and mismatch

Authors:
Ohad Lipsky;Benny Porat;Elly Porat;B. Riva Shalom;Asaf Tzur
Affiliations:
Department of Computer Science, Bar-Ilan University, Ramat-Gan, Israel;Department of Computer Science, Bar-Ilan University, Ramat-Gan, Israel;Department of Computer Science, Bar-Ilan University, Ramat-Gan, Israel;Department of Computer Science, Bar-Ilan University, Ramat-Gan, Israel;Department of Computer Science, Bar-Ilan University, Ramat-Gan, Israel
Venue:
ISAAC'07 Proceedings of the 18th international conference on Algorithms and computation
Year:
2007

Citing 10
Cited 0

Efficient string matching with k mismatches

Theoretical Computer Science
Fast algorithms for approximately counting mismatches

Information Processing Letters
Approximate string matching: a simpler faster algorithm

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
An Extension of the String-to-String Correction Problem

Journal of the ACM (JACM)
Approximate swapped matching

Information Processing Letters
Overlap matching

Information and Computation
Pattern matching with swaps

FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
On the complexity of the Extended String-to-String Correction Problem

STOC '75 Proceedings of seventh annual ACM symposium on Theory of computing
Faster algorithms for string matching with k mismatches

Journal of Algorithms - Special issue: SODA 2000
Approximate swap and mismatch edit distance

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Finding the similarity between two sequences is a major problem in computer science. It is motivated by many issues from computational biology as well as from information retrieval and image processing. These fields take into account possible corruptions of the data caused by genome rearrangements, typing mistakes, and more. Therefore, many applications do not require merely complete resemblance of the sequences, but rather an approximated matching. We consider mismatches and swaps as natural mistakes which are allowed in a meagre number. The edit distance problem with swap and mismatch operations was discussed by Amir et. al. [3]. They solved the problem in O(n√m log m) time. From then on the problem of string matching with at most k swaps and mismatches errors was open. In this paper we present an algorithm that finds all locations where the pattern has at most k mismatch and swap errors in time O(n√k log m).