Adaptation in natural and artificial systems
Adaptation in natural and artificial systems
Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization
Machine Learning - Special issue on applications in molecular biology
A view of the EM algorithm that justifies incremental, sparse, and other variants
Learning in graphical models
Combinatorial Approaches to Finding Subtle Signals in DNA Sequences
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Information Theory, Inference & Learning Algorithms
Information Theory, Inference & Learning Algorithms
Monte Carlo Strategies in Scientific Computing
Monte Carlo Strategies in Scientific Computing
A Monte Carlo EM Algorithm for De Novo Motif Discovery in Biomolecular Sequences
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images
IEEE Transactions on Pattern Analysis and Machine Intelligence
Memetic algorithms for de novo motif-finding in biomedical sequences
Artificial Intelligence in Medicine
Hi-index | 0.10 |
Among a set of observed relevant DNA sequences coming from a set of co-regulated genes, there exist some short, functional yet hidden sub-sequence patterns which recurrently appear across genomic sequences. The task of sequence pattern discovery, also known as motif discovery, is to uncover these unseen subsequences ab initio and then build a motif model for them. A plethora of motif algorithms has been designed to tackle this problem. This paper aims to compare a set of optimization techniques by consolidating them under the same maximum-likelihood (ML) framework. The framework unifies a suite of motif-finding algorithms by maximizing the same function, that enables a systematic comparison of different optimization schemes as well as provision of practical guidance on using these techniques. As a foundation, the ML framework is built for two categories of iterative optimization techniques (i.e. deterministic and stochastic) capable of exploring the sequence alignment space. The deterministic algorithms are to maximize the likelihood function by performing iteratively greedy local search. The stochastic algorithms are to iteratively draw motif location samples using Monte Carlo simulation and simultaneously keep track of solutions with local maximum-likelihoods. A total of five ML-based sequence pattern-finding algorithms are developed, evaluated and compared using simulated and real biological sequences. Results show that deterministic algorithms are more time-efficient than its stochastic counterparts, but their performance is not as good as the stochastic algorithms.