Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization
Machine Learning - Special issue on applications in molecular biology
Learning mixture models using a genetic version of the EM algorithm
Pattern Recognition Letters
An Introduction to Genetic Algorithms
An Introduction to Genetic Algorithms
Combinatorial Approaches to Finding Subtle Signals in DNA Sequences
Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
FMGA: Finding Motifs by Genetic Algorithm
BIBE '04 Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering
MDGA: motif discovery using a genetic algorithm
GECCO '05 Proceedings of the 7th annual conference on Genetic and evolutionary computation
Genetic-Based EM Algorithm for Learning Gaussian Mixture Models
IEEE Transactions on Pattern Analysis and Machine Intelligence
Monte Carlo Strategies in Scientific Computing
Monte Carlo Strategies in Scientific Computing
A Monte Carlo EM Algorithm for De Novo Motif Discovery in Biomolecular Sequences
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Comparison of optimization techniques for sequence pattern discovery by maximum-likelihood
Pattern Recognition Letters
A tutorial for competent memetic algorithms: model, taxonomy, and design issues
IEEE Transactions on Evolutionary Computation
Memetic algorithms for de novo motif-finding in biomedical sequences
Artificial Intelligence in Medicine
Hi-index | 0.01 |
Multiple sequence local alignment, often deployed for de novo discovery of biological motifs hidden in a set of DNA or protein sequences, remains a challenge in bioinformatics and computational biology. Many algorithms and software packages have been developed to address the problem. Expectation maximization (EM), one of the popular local alignment methods, is often used to solve the motif-finding problem. However, EM largely depends on its initialization and can be easily trapped in local optima. This paper presents the Genetic-enabled EM Motif-Finding Algorithm (GEMFA) in an effort to mitigate the difficulties confronted the EM-based motif discovery algorithms. The new algorithm integrates a simple genetic algorithm (GA) with a local searcher to explore the local alignment space, that is, it combines deterministic local alignment methods with a simple GA to effectively perform de novo motif discovery. It first initializes a population of multiple local alignments each of which is encoded on a chromosome that represents a potential solution. GEMFA then performs heuristic search in the whole alignment space using minimum distance length (MDL) as the fitness function, which is generalized from maximum log-likelihood. The genetic algorithm gradually moves this population towards the best alignment from which the motif model is derived. Simulated and real biological sequence analysis showed that GEMFA significantly improved deterministic local alignment methods especially in the subtle motif sequence alignment, and it also outperformed other algorithms tested.