Assessing the Statistical Significance of Overrepresented Oligonucleotides

  • Authors:
  • Alain Denise;Mireille Régnier;Mathias Vandenbogaert

  • Affiliations:
  • -;-;-

  • Venue:
  • WABI '01 Proceedings of the First International Workshop on Algorithms in Bioinformatics
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

Assessing statistical significance of over-representation of exceptional words is becoming an important task in computational biology. We show on two problems how large deviation methodology applies. First, when some oligomer H occurs more often than expected, e.g. may be overrepresented, large deviations allow for a very efficient computation of the so-called p-value. The second problem we address is the possible changes in the oligomers distribution induced by the over-representation of some pattern. Discarding this noise allows for the detection of weaker signals. Related algorithmic and complexity issues are discussed and compared to previous results. The approach is illustrated with three typical examples of applications on biological data.