Sparse approaches for the exact distribution of patterns in long state sequences generated by a Markov source

  • Authors:
  • Gregory Nuel;Jean-Guillaume Dumas

  • Affiliations:
  • MAP5, UMR CNRS 8145, Department of Applied Mathematics, Paris Descartes University, France;Laboratoire Jean Kuntzmann, UMR CNRS 5224, Université Joseph Fourier, Grenoble, France and Claude Shannon Institute, University College Dublin, Ireland

  • Venue:
  • Theoretical Computer Science
  • Year:
  • 2013

Quantified Score

Hi-index 5.23

Visualization

Abstract

We present two novel approaches for the computation of the exact distribution of a pattern in a long sequence. Both approaches take into account the sparse structure of the problem and are two-part algorithms. The first approach relies on a partial recursion after a fast computation of the second largest eigenvalue of the transition matrix of a Markov chain embedding. The second approach uses fast Taylor expansions of an exact bivariate rational reconstruction of the distribution. We illustrate the interest of both approaches on a simple toy example and two biological applications: the transcription factors of the Human Chromosome 10 and the PROSITE signatures of functional motifs in proteins. On these examples our methods demonstrate their complementarity and their ability to extend the domain of feasibility for exact computations in pattern problems to a new level.