String matching with up to k swaps and mismatches

Authors:
Ohad Lipsky;Benny Porat;Ely Porat;B. Riva Shalom;Asaf Tzur
Affiliations:
Department of Computer Science, Bar-Ilan University, Ramat-Gan 52900, Israel;Department of Computer Science, Bar-Ilan University, Ramat-Gan 52900, Israel;Department of Computer Science, Bar-Ilan University, Ramat-Gan 52900, Israel;Department of Software Engineering, Shenkar College, Ramat-Gan 52526, Israel;Department of Computer Science, Bar-Ilan University, Ramat-Gan 52900, Israel
Venue:
Information and Computation
Year:
2010

Citing 14
Cited 0

Efficient string matching with k mismatches

Theoretical Computer Science
Approximate string matching: a simpler faster algorithm

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
An Extension of the String-to-String Correction Problem

Journal of the ACM (JACM)
Verifying candidate matches in sparse and wildcard matching

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Approximate swapped matching

Information Processing Letters
Overlap matching

Information and Computation
Pattern matching with swaps

FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
On the complexity of the Extended String-to-String Correction Problem

STOC '75 Proceedings of seventh annual ACM symposium on Theory of computing
Faster algorithms for string matching with k mismatches

Journal of Algorithms - Special issue: SODA 2000
Dictionary matching and indexing with errors and don't cares

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Pattern matching with address errors: rearrangement distances

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Swap and mismatch edit distance

Algorithmica
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Approximate swap and mismatch edit distance

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Finding the similarity between two sequences is a major problem in computer science. It is motivated by many issues from computational biology as well as from information retrieval and image processing. These fields take into account possible corruptions of the data caused by genome rearrangements, typing mistakes, and more. Therefore, many applications do not require merely complete resemblance of the sequences, but rather an approximate matching. We consider mismatches and swaps as natural mistakes which are allowed in a meagre number. The edit distance problem with swap and mismatch operations was solved in O(nmlogm) time. Yet, the problem of string matching with at most k swaps and mismatches errors was open. In this paper, we present an algorithm that finds all locations where the pattern has at most k mismatch and swap errors in time O(nklogm).