Mismatch sampling

Authors:
Raphaël Clifford;Klim Efremenko;Benny Porat;Ely Porat;Amir Rothschild
Affiliations:
Dept. Computer Science, University of Bristol, UK;Dept. Computer Science, Bar-Ilan University, Israel and Dept. Computer Science and Applied Mathematics, Weizman institute, Israel and Dept. Computer Science, Tel-Aviv University, Israel;Dept. Computer Science, Bar-Ilan University, Israel;Dept. Computer Science, Bar-Ilan University, Israel;Dept. Computer Science, Tel-Aviv University, Israel
Venue:
Information and Computation
Year:
2012

Citing 9
Cited 0

Efficient string matching with k mismatches

Theoretical Computer Science
Generalized string matching

SIAM Journal on Computing
Introduction to algorithms

Introduction to algorithms
Fast algorithms for approximately counting mismatches

Information Processing Letters
Faster Algorithms for String Matching Problems: Matching the Convolution Bound

FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Faster algorithms for string matching with k mismatches

Journal of Algorithms - Special issue: SODA 2000
Approximating general metric distances between a pattern and a text

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
From coding theory to efficient pattern matching

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
K-mismatch with don't cares

ESA'07 Proceedings of the 15th annual European conference on Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

We reconsider the well-known problem of pattern matching under the Hamming distance. Previous approaches have shown how to count the number of mismatches efficiently, especially when a bound is known for the maximum Hamming distance. Our interest is different in that we wish to collect a random sample of mismatches of fixed size at each position in the text. Given a pattern p of length m and a text t of length n, we show how to sample with high probability up to c mismatches from every alignment of p and t in O((c+logn)(n+mlogm)logm) time. Further, we guarantee that the mismatches are sampled uniformly and can therefore be seen as representative of the types of mismatches that occur.