Metrics for comparing regulatory sequences on the basis of pattern counts

  • Authors:
  • Jacques Van Helden

  • Affiliations:
  • SCMBB, Université Libre de Bruxelles, Campus Plaine CP 263, Boulevard du Triomphe, B-1050 Bruxelles, Belgium

  • Venue:
  • Bioinformatics
  • Year:
  • 2004

Quantified Score

Hi-index 3.84

Visualization

Abstract

Motivation: Upstream sequences contain short motifs, which mediate transcriptional regulation by specifically binding different transcription factors. The presence of common motifs in the regulatory regions of two genes might be considered as a clue for a potential co-regulation. A pattern count-based (dis)similarity metric between sequences could thus be used to classify genes according to their putative regulatory properties. Results: We present here several metrics which rely on probability theory, and which aim at comparing sequences on the basis of pattern counts. We compare these metrics to several classical dissimilarity and similarity metrics, and illustrate their behaviour with a biological example. Supplementary information: The data, results, and R routines used in this paper are freely available at http://rsat.ulb.ac.be/rsat/published_data/pattern_count_metrics_2003/