A study of the repetitive structure and distribution of short motifs in human genomic sequences
International Journal of Bioinformatics Research and Applications
Shuffling biological sequences with motif constraints
Journal of Discrete Algorithms
Poisson-Based Self-Organizing Neural Networks for Pattern Discovery
ICIC '08 Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Theoretical and Methodological Issues
ISCIS'06 Proceedings of the 21st international conference on Computer and Information Sciences
Maximal words in sequence comparisons based on subword composition
Algorithms and Applications
Hi-index | 3.84 |
Motivation: Upstream sequences contain short motifs, which mediate transcriptional regulation by specifically binding different transcription factors. The presence of common motifs in the regulatory regions of two genes might be considered as a clue for a potential co-regulation. A pattern count-based (dis)similarity metric between sequences could thus be used to classify genes according to their putative regulatory properties. Results: We present here several metrics which rely on probability theory, and which aim at comparing sequences on the basis of pattern counts. We compare these metrics to several classical dissimilarity and similarity metrics, and illustrate their behaviour with a biological example. Supplementary information: The data, results, and R routines used in this paper are freely available at http://rsat.ulb.ac.be/rsat/published_data/pattern_count_metrics_2003/