TFBS identification by position- and consensus-led genetic algorithm with local filtering
Proceedings of the 9th annual conference on Genetic and evolutionary computation
Bayesian unsupervised learning of DNA regulatory binding regions
Advances in Artificial Intelligence
Computational Biology and Chemistry
Enhancing motif refinement by incorporating comparative genomics data
ISBRA'07 Proceedings of the 3rd international conference on Bioinformatics research and applications
A two-block motif discovery method with improved accuracy
ICIC'07 Proceedings of the intelligent computing 3rd international conference on Advanced intelligent computing theories and applications
PFP: a computational framework for phylogenetic footprinting in prokaryotic genomes
ISBRA'08 Proceedings of the 4th international conference on Bioinformatics research and applications
Scoring method for tumor prediction from microarray data using an evolutionary fuzzy classifier
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Finding gapped motifs by a novel evolutionary algorithm
EvoBIO'10 Proceedings of the 8th European conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics
Motif yggdrasil: sampling from a tree mixture model
RECOMB'06 Proceedings of the 10th annual international conference on Research in Computational Molecular Biology
Parallelizing a hybrid multiobjective differential evolution for identifying cis-regulatory elements
Proceedings of the 20th European MPI Users' Group Meeting
Hi-index | 3.84 |
Motivation: Transcription factors (TFs) bind directly to short segments on the genome, often within hundreds to thousands of base pairs upstream of gene transcription start sites, to regulate gene expression. The experimental determination of TFs binding sites is expensive and time-consuming. Many motif-finding programs have been developed, but no program is clearly superior in all situations. Practitioners often find it difficult to judge which of the motifs predicted by these algorithms are more likely to be biologically relevant. Results: We derive a comprehensive scoring function based on a full Bayesian model that can handle unknown site abundance, unknown motif width and two-block motifs with variable-length gaps. An algorithm called BioOptimizer is proposed to optimize this scoring function so as to reduce noise in the motif signal found by any motif-finding program. The accuracy of BioOptimizer, which can be used in conjunction with several existing programs, is shown to be superior to using any of these motif-finding programs alone when evaluated by both simulation studies and application to sets of co-regulated genes in bacteria. In addition, this scoring function formulation enables us to compare objectively different predicted motifs and select the optimal ones, effectively combining the strengths of existing programs. Availability: BioOptimizer is available for download at www.fas.harvard.edu/~junliu/BioOptimizer/