Self-organizing neural networks to support the discovery of DNA-binding motifs
Neural Networks - 2006 Special issue: Advances in self-organizing maps--WSOM'05
Overlap-Based Similarity Metrics for Motif Search in DNA Sequences
ICONIP '09 Proceedings of the 16th International Conference on Neural Information Processing: Part II
MISCORE: mismatch-based matrix similarity scores for DNA motif detection
ICONIP'08 Proceedings of the 15th international conference on Advances in neuro-information processing - Volume Part I
Assessment of clustering algorithms for unsupervised transcription factor binding site discovery
Expert Systems with Applications: An International Journal
ML-Consensus: a general consensus model for variable-length transcription factor binding sites
EvoBIO'11 Proceedings of the 9th European conference on Evolutionary computation, machine learning and data mining in bioinformatics
Lecture notes in computer science: multiple DNA sequence alignment using joint weight matrix
ICCSA'11 Proceedings of the 2011 international conference on Computational science and its applications - Volume Part III
A cost-aggregating integer linear program for motif finding
Journal of Discrete Algorithms
A compact mathematical programming formulation for DNA motif finding
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Geometric visualization of TF binding sites in context
Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Conservation patterns in cis-elements reveal compensatory mutations
RCG'06 Proceedings of the RECOMB 2006 international conference on Comparative Genomics
Hi-index | 3.84 |
Motivation: An important step in unravelling the transcriptional regulatory network of an organism is to identify, for each transcription factor, all of its DNA binding sites. Several approaches are commonly used in searching for a transcription factor's binding sites, including consensus sequences and position-specific scoring matrices. In addition, methods that compute the average number of nucleotide matches between a putative site and all known sites can be employed. Such basic approaches can all be naturally extended by incorporating pairwise nucleotide dependencies and per-position information content. In this paper, we evaluate the effectiveness of these basic approaches and their extensions in finding binding sites for a transcription factor of interest without erroneously identifying other genomic sequences. Results: In cross-validation testing on a dataset of Escherichia coli transcription factors and their binding sites, we show that there are statistically significant differences in how well various methods identify transcription factor binding sites. The use of per-position information content improves the performance of all basic approaches. Furthermore, including local pairwise nucleotide dependencies within binding site models results in statistically significant performance improvements for approaches based on nucleotide matches. Based on our analysis, the best results when searching for DNA binding sites of a particular transcription factor are obtained by methods that incorporate both information content and local pairwise correlations. Availability: The software is available at http://compbio.cs.princeton.edu/bindsites