Mismatch Sampling

  • Authors:
  • Raphaël Clifford;Klim Efremenko;Benny Porat;Ely Porat;Amir Rothschild

  • Affiliations:
  • Dept. of Computer Science, University of Bristol, Bristol, UK BS8 1UB;Dept. of Computer Science, 52900 Ramat-Gan, Israel and Weizman institute, Dept. of Computer Science and Applied Mathematics, Bar-Ilan University, Rehovot, Israel;Dept. of Computer Science, Bar-Ilan University, Ramat-Gan, Israel 52900;Dept. of Computer Science, Bar-Ilan University, Ramat-Gan, Israel 52900;Dept. of computer science, Tel-Aviv, Israel and Bar-Ilan University, Dept. of Computer Science, Tel-Aviv University, Ramat-Gan, Israel 52900

  • Venue:
  • SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the well known problem of pattern matching under the Hamming distance. Previous approaches have shown how to count the number of mismatches efficiently, especially when a bound is known for the maximum Hamming distance. Our interest is different in that we wish collect a random sample of mismatches of fixed size at each position in the text. Given a pattern p of length m and a text t of length n , we show how to sample with high probability c mismatches where possible from every alignment of p and t in O ((c + logn )(n + m logm )logm ) time. Further, we guarantee that the mismatches are sampled uniformly and can therefore be seen as representative of the types of mismatches that occur.