Comparison of optimization techniques for sequence pattern discovery by maximum-likelihood

Authors:
Chengpeng Bi
Affiliations:
Bioinformatics and Intelligent Computing Lab, Division of Clinical Pharmacology, Children's Mercy Hospitals, Schools of Medicine, Computing and Engineering, University of Missouri, Kansas City, MO ...
Venue:
Pattern Recognition Letters
Year:
2010

Citing 8
Cited 2

Adaptation in natural and artificial systems

Adaptation in natural and artificial systems
Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization

Machine Learning - Special issue on applications in molecular biology
A view of the EM algorithm that justifies incremental, sparse, and other variants

Learning in graphical models
Combinatorial Approaches to Finding Subtle Signals in DNA Sequences

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Information Theory, Inference & Learning Algorithms

Information Theory, Inference & Learning Algorithms
Monte Carlo Strategies in Scientific Computing

Monte Carlo Strategies in Scientific Computing
A Monte Carlo EM Algorithm for De Novo Motif Discovery in Biomolecular Sequences

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images

IEEE Transactions on Pattern Analysis and Machine Intelligence

Deterministic local alignment methods improved by a simple genetic algorithm

Neurocomputing
Memetic algorithms for de novo motif-finding in biomedical sequences

Artificial Intelligence in Medicine

Quantified Score

Hi-index	0.10

Visualization

Abstract

Among a set of observed relevant DNA sequences coming from a set of co-regulated genes, there exist some short, functional yet hidden sub-sequence patterns which recurrently appear across genomic sequences. The task of sequence pattern discovery, also known as motif discovery, is to uncover these unseen subsequences ab initio and then build a motif model for them. A plethora of motif algorithms has been designed to tackle this problem. This paper aims to compare a set of optimization techniques by consolidating them under the same maximum-likelihood (ML) framework. The framework unifies a suite of motif-finding algorithms by maximizing the same function, that enables a systematic comparison of different optimization schemes as well as provision of practical guidance on using these techniques. As a foundation, the ML framework is built for two categories of iterative optimization techniques (i.e. deterministic and stochastic) capable of exploring the sequence alignment space. The deterministic algorithms are to maximize the likelihood function by performing iteratively greedy local search. The stochastic algorithms are to iteratively draw motif location samples using Monte Carlo simulation and simultaneously keep track of solutions with local maximum-likelihoods. A total of five ML-based sequence pattern-finding algorithms are developed, evaluated and compared using simulated and real biological sequences. Results show that deterministic algorithms are more time-efficient than its stochastic counterparts, but their performance is not as good as the stochastic algorithms.