A Monte Carlo EM Algorithm for De Novo Motif Discovery in Biomolecular Sequences

Authors:
Chengpeng Bi
Affiliations:
Children's Mercy Hospitals and Clinics and University of Missouri, Kansas City
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2009

Citing 6
Cited 4

Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization

Machine Learning - Special issue on applications in molecular biology
Combinatorial Approaches to Finding Subtle Signals in DNA Sequences

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
BacTregulators: a database of transcriptional regulators in bacteria and archaea

Bioinformatics
GAME: detecting cis-regulatory elements using a genetic algorithm

Bioinformatics
Monte Carlo Strategies in Scientific Computing

Monte Carlo Strategies in Scientific Computing
Multiple sequence local alignment using monte carlo EM algorithm

ISBRA'07 Proceedings of the 3rd international conference on Bioinformatics research and applications

Deterministic local alignment methods improved by a simple genetic algorithm

Neurocomputing
Comparison of optimization techniques for sequence pattern discovery by maximum-likelihood

Pattern Recognition Letters
Optimizing genetic algorithm for motif discovery

Mathematical and Computer Modelling: An International Journal
Memetic algorithms for de novo motif-finding in biomedical sequences

Artificial Intelligence in Medicine

Quantified Score

Hi-index	0.01

Visualization

Abstract

Motif discovery methods play pivotal roles in deciphering the genetic regulatory codes (i.e., motifs) in genomes as well as in locating conserved domains in protein sequences. The Expectation Maximization (EM) algorithm is one of the most popular methods used in de novo motif discovery. Based on the position weight matrix (PWM) updating technique, this paper presents a Monte Carlo version of the EM motif-finding algorithm that carries out stochastic sampling in local alignment space to overcome the conventional EM's main drawback of being trapped in a local optimum. The newly implemented algorithm is named as Monte Carlo EM Motif Discovery Algorithm (MCEMDA). MCEMDA starts from an initial model, and then it iteratively performs Monte Carlo simulation and parameter update until convergence. A log-likelihood profiling technique together with the top-k strategy is introduced to cope with the phase shifts and multiple modal issues in motif discovery problem. A novel grouping motif alignment (GMA) algorithm is designed to select motifs by clustering a population of candidate local alignments and successfully applied to subtle motif discovery. MCEMDA compares favorably to other popular PWM-based and word enumerative motif algorithms tested using simulated (l, d)-motif cases, documented prokaryotic, and eukaryotic DNA motif sequences. Finally, MCEMDA is applied to detect large blocks of conserved domains using protein benchmarks and exhibits its excellent capacity while compared with other multiple sequence alignment methods.