Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization
Machine Learning - Special issue on applications in molecular biology
Combinatorial Approaches to Finding Subtle Signals in DNA Sequences
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Hi-index | 0.00 |
The identification of regulatory elements as over-represented motifs in the promoters of potentially co-regulated genes is an important and challenging problem in computational biology. Although many motif detection programs have been developed so far, they still seem to be immature practically. In particular the choice of tunable parameters is often critical to success. Thus knowledge regarding which parameter settings are most appropriate for various types of target motifs is invaluable, but unfortunately has been scarce. In this paper, we report our parameter landscape analysis of two widely-used programs (the Gibbs Sampler (GS) and MEME). Our results show that GS is relatively sensitive to the changes of some parameter values while MEME is more stable. We present recommended parameter settings for GS optimized for four different motif lengths. Thus, running GS four times with these settings should significantly decrease the risk of overlooking subtle motifs.