Large deviation properties for patterns

  • Authors:
  • Jérémie Bourdon;Mireille Régnier

  • Affiliations:
  • LINA, CNRS UMR 6241, Université de Nantes, France and DYLISS-Inria team, Inria Rennes-Bretagne-Atlantique, France;AMIB-Inria team, LIX-Ecole Polytechnique, 91128 Palaiseau, France

  • Venue:
  • Journal of Discrete Algorithms
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

Deciding whether a given pattern is over- or under-represented according to a given background model is a key question in computational biology. Such a decision is usually made by computing some p-values reflecting the ''exceptionality'' of a pattern in a given sequence or set of sequences. In the simplest cases (short and simple patterns, simple background model, small number of sequences), an exact p-value can be computed with a tractable complexity. The realistic cases are in general too complicated to get such an exact p-value. Approximations are thus proposed (Gaussian, Poisson, Large deviation approximations). These approximations are applicable under some conditions: Gaussian approximations are valid in the central domain while Poisson and Large deviation approximations are valid for rare events. In the present paper, we prove a large deviation approximation to the double strands counting problem that refers to a counting of a given pattern in a set of sequences that arise from both strands of the genome. In that case, dependencies between a sequence and its reverse complement cannot be neglected. They are captured here for a Bernoulli model from general combinatorial properties of the pattern. A large deviation result is also provided for a set of small sequences.