Comparative analysis of methods for representing and searching for transcription factor binding sites

Authors:
Robert Osada;Elena Zaslavsky;Mona Singh
Affiliations:
Department of Computer Science & Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA;Department of Computer Science & Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA;Department of Computer Science & Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
Venue:
Bioinformatics
Year:
2004

Citing 0
Cited 11

Self-organizing neural networks to support the discovery of DNA-binding motifs

Neural Networks - 2006 Special issue: Advances in self-organizing maps--WSOM'05
Overlap-Based Similarity Metrics for Motif Search in DNA Sequences

ICONIP '09 Proceedings of the 16th International Conference on Neural Information Processing: Part II
MISCORE: mismatch-based matrix similarity scores for DNA motif detection

ICONIP'08 Proceedings of the 15th international conference on Advances in neuro-information processing - Volume Part I
Assessment of clustering algorithms for unsupervised transcription factor binding site discovery

Expert Systems with Applications: An International Journal
ML-Consensus: a general consensus model for variable-length transcription factor binding sites

EvoBIO'11 Proceedings of the 9th European conference on Evolutionary computation, machine learning and data mining in bioinformatics
Lecture notes in computer science: multiple DNA sequence alignment using joint weight matrix

ICCSA'11 Proceedings of the 2011 international conference on Computational science and its applications - Volume Part III
A cost-aggregating integer linear program for motif finding

Journal of Discrete Algorithms
A compact mathematical programming formulation for DNA motif finding

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Geometric visualization of TF binding sites in context

Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Conservation patterns in cis-elements reveal compensatory mutations

RCG'06 Proceedings of the RECOMB 2006 international conference on Comparative Genomics
A Bayesian Scoring Scheme based Particle Swarm Optimization algorithm to identify transcription factor binding sites

Applied Soft Computing

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: An important step in unravelling the transcriptional regulatory network of an organism is to identify, for each transcription factor, all of its DNA binding sites. Several approaches are commonly used in searching for a transcription factor's binding sites, including consensus sequences and position-specific scoring matrices. In addition, methods that compute the average number of nucleotide matches between a putative site and all known sites can be employed. Such basic approaches can all be naturally extended by incorporating pairwise nucleotide dependencies and per-position information content. In this paper, we evaluate the effectiveness of these basic approaches and their extensions in finding binding sites for a transcription factor of interest without erroneously identifying other genomic sequences. Results: In cross-validation testing on a dataset of Escherichia coli transcription factors and their binding sites, we show that there are statistically significant differences in how well various methods identify transcription factor binding sites. The use of per-position information content improves the performance of all basic approaches. Furthermore, including local pairwise nucleotide dependencies within binding site models results in statistically significant performance improvements for approaches based on nucleotide matches. Based on our analysis, the best results when searching for DNA binding sites of a particular transcription factor are obtained by methods that incorporate both information content and local pairwise correlations. Availability: The software is available at http://compbio.cs.princeton.edu/bindsites