Exact analysis of horspool's and sunday's pattern matching algorithms with probabilistic arithmetic automata

  • Authors:
  • Tobias Marschall;Sven Rahmann

  • Affiliations:
  • Bioinformatics for High-Throughput Technologies, Algorithm Engineering, Computer Science XI, TU Dortmund, Germany;Bioinformatics for High-Throughput Technologies, Algorithm Engineering, Computer Science XI, TU Dortmund, Germany

  • Venue:
  • LATA'10 Proceedings of the 4th international conference on Language and Automata Theory and Applications
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We define deterministic arithmetic automata (DAAs) and connect them to a framework called probabilistic arithmetic automata (PAAs) [9]. We use DAAs and PAAs to compute the entire exact probability distribution (in contrast to, e.g., asymptotic expectation and variance) of the number $X^p_\ell$ of text characters accessed by the Horspool or Sunday pattern matching algorithms when matching a fixed pattern p against a random text of length ℓ. The random text model can be quite general, from simple uniform models to higher-order Markov models or hidden Markov models (HMMs). We develop several alternative constructions with different state spaces of the automata, leading to alternative time and space complexities for the computations. To our knowledge, this is the first time that suffix-based pattern matching algorithms are analyzed exactly. We present (perhaps surprising) exemplary results on short patterns and moderate text lengths. Our results easily generalize to any search-window based pattern matching algorithm.