Metrics for comparing regulatory sequences on the basis of pattern counts

Authors:
Jacques Van Helden
Affiliations:
SCMBB, Université Libre de Bruxelles, Campus Plaine CP 263, Boulevard du Triomphe, B-1050 Bruxelles, Belgium
Venue:
Bioinformatics
Year:
2004

Citing 0
Cited 5

A study of the repetitive structure and distribution of short motifs in human genomic sequences

International Journal of Bioinformatics Research and Applications
Shuffling biological sequences with motif constraints

Journal of Discrete Algorithms
Poisson-Based Self-Organizing Neural Networks for Pattern Discovery

ICIC '08 Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Theoretical and Methodological Issues
An efficient algorithm for the identification of repetitive variable motifs in the regulatory sequences of co-expressed genes

ISCIS'06 Proceedings of the 21st international conference on Computer and Information Sciences
Maximal words in sequence comparisons based on subword composition

Algorithms and Applications

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Upstream sequences contain short motifs, which mediate transcriptional regulation by specifically binding different transcription factors. The presence of common motifs in the regulatory regions of two genes might be considered as a clue for a potential co-regulation. A pattern count-based (dis)similarity metric between sequences could thus be used to classify genes according to their putative regulatory properties. Results: We present here several metrics which rely on probability theory, and which aim at comparing sequences on the basis of pattern counts. We compare these metrics to several classical dissimilarity and similarity metrics, and illustrate their behaviour with a biological example. Supplementary information: The data, results, and R routines used in this paper are freely available at http://rsat.ulb.ac.be/rsat/published_data/pattern_count_metrics_2003/