Mismatch sampling

  • Authors:
  • Raphaël Clifford;Klim Efremenko;Benny Porat;Ely Porat;Amir Rothschild

  • Affiliations:
  • Dept. Computer Science, University of Bristol, UK;Dept. Computer Science, Bar-Ilan University, Israel and Dept. Computer Science and Applied Mathematics, Weizman institute, Israel and Dept. Computer Science, Tel-Aviv University, Israel;Dept. Computer Science, Bar-Ilan University, Israel;Dept. Computer Science, Bar-Ilan University, Israel;Dept. Computer Science, Tel-Aviv University, Israel

  • Venue:
  • Information and Computation
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We reconsider the well-known problem of pattern matching under the Hamming distance. Previous approaches have shown how to count the number of mismatches efficiently, especially when a bound is known for the maximum Hamming distance. Our interest is different in that we wish to collect a random sample of mismatches of fixed size at each position in the text. Given a pattern p of length m and a text t of length n, we show how to sample with high probability up to c mismatches from every alignment of p and t in O((c+logn)(n+mlogm)logm) time. Further, we guarantee that the mismatches are sampled uniformly and can therefore be seen as representative of the types of mismatches that occur.