Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization

Authors:
Timothy L. Bailey;Charles Elkan
Affiliations:
Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California 92093-0114. tbailey@cs.ucsd.edu;Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California 92093-0114. elkan@cs.ucsd.edu
Venue:
Machine Learning - Special issue on applications in molecular biology
Year:
1995

Citing 0
Cited 83

Algorithms for phylogenetic footprinting

RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Finding motifs using random projections

RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
A Gibbs sampling method to detect over-represented motifs in the upstream regions of co-expressed genes

RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
A hypothesis driven approach to condition specific transcription factor binding site characterization in S.c.

Proceedings of the 2002 ACM symposium on Applied computing
Finding motifs in the twilight zone

Proceedings of the sixth annual international conference on Computational biology
Discriminative motifs

Proceedings of the sixth annual international conference on Computational biology
A comparative analysis method for detecting binding sites in coding regions

RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
Biological Sequence Data Mining

PKDD '01 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery
Pattern discovery in sequences under a Markov assumption

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining biomolecular data using background knowledge and artificial neural networks

Handbook of massive data sets
LOGOS: a modular Bayesian model for de novo motif detection

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Identifying Regulatory Signals in DNA-Sequences with a Non-statistical Approximation Approach

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Probabilistic discovery of time series motifs

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
From profiles to patterns and back again: a branch and bound algorithm for finding near optimal motif profiles

RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Finding motifs for insufficient number of sequences with strong binding to transcription facto

RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
A combined model and a varied Gibbs sampling algorithm used for motif discovery

APBC '04 Proceedings of the second conference on Asia-Pacific bioinformatics - Volume 29
Particle Swarm Optimisation for Protein Motif Discovery

Genetic Programming and Evolvable Machines
Verbumculus and the discovery of unusual words

Journal of Computer Science and Technology - Special issue on bioinformatics
MISAE: A New Approach for Regulatory Motif Extraction

CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
A Uniform Projection Method for Motif Discovery in DNA Sequences

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Combining Sequence and Time Series Expression Data to Learn Transcriptional Modules

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
2005 Special Issue: Identification of motifs with insertions and deletions in protein sequences using self-organizing neural networks

Neural Networks - 2005 Special issue: IJCNN 2005
Identification of weak motifs in multiple biological sequences using genetic algorithm

Proceedings of the 8th annual conference on Genetic and evolutionary computation
PDC: pattern discovery with confidence in DNA sequences

ACST'06 Proceedings of the 2nd IASTED international conference on Advances in computer science and technology
Motif discovery by monotone scores

Discrete Applied Mathematics
An upper bound on the hardness of exact matrix based motif discovery

Journal of Discrete Algorithms
DNA Motif Representation with Nucleotide Dependency

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Efficient composite pattern finding from monad patterns

International Journal of Bioinformatics Research and Applications
UPNT: Uniform Projection and Neighbourhood Thresholding method for motif discovery

International Journal of Bioinformatics Research and Applications
An efficient motif discovery algorithm with unknown motif length and number of binding sites

International Journal of Data Mining and Bioinformatics
A graph approach to the threshold all-against-all substring matching problem

Journal of Experimental Algorithmics (JEA)
An effective algorithm of motif finding problem

International Journal of Bioinformatics Research and Applications
Massively Parallelized DNA Motif Search on the Reconfigurable Hardware Platform COPACOBANA

PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics
GPU-MEME: Using Graphics Hardware to Accelerate Motif Finding in DNA Sequences

PRIB '08 Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics
MOGAMOD: Multi-objective genetic algorithm for motif discovery

Expert Systems with Applications: An International Journal
Efficient automatic exact motif discovery algorithms for biological sequences

Expert Systems with Applications: An International Journal
Learning structural SVMs with latent variables

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Efficient discovery of unusual patterns in time series

New Generation Computing
A Monte Carlo EM Algorithm for De Novo Motif Discovery in Biomolecular Sequences

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
MProfiler: A Profile-Based Method for DNA Motif Discovery

PRIB '09 Proceedings of the 4th IAPR International Conference on Pattern Recognition in Bioinformatics
REFINEMENT: A search framework for the identification of interferon-responsive elements in DNA sequences - a case study with ISRE and GAS

Computational Biology and Chemistry
Detection of over-represented motifs corresponding to known TFBSs via motif clustering and matching

Computers & Mathematics with Applications
Moitf GibbsGA: Sampling Transcription Factor Binding Sites Coupled with PSFM Optimization by GA

ISICA '09 Proceedings of the 4th International Symposium on Advances in Computation and Intelligence
Improving motif refinement using hybrid expectation maximization and random projection

ISB '10 Proceedings of the International Symposium on Biocomputing
A Gibbs sampling algorithm for motif discovery using a linear mixed model

ISB '10 Proceedings of the International Symposium on Biocomputing
Efficient selection of unique and popular oligos for large EST databases

CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Build a dictionary, learn a grammar, decipher stegoscripts, and discover genomic regulatory elements

RECOMB'05 Proceedings of the 2005 joint annual satellite conference on Systems biology and regulatory genomics
Multiple sequence local alignment using monte carlo EM algorithm

ISBRA'07 Proceedings of the 3rd international conference on Bioinformatics research and applications
Comparison of simple encoding schemes in GA's for the motif finding problem: preliminary results

BSB'07 Proceedings of the 2nd Brazilian conference on Advances in bioinformatics and computational biology
MotifMiner: a table driven greedy algorithm for DNA motif mining

ICIC'09 Proceedings of the Intelligent computing 5th international conference on Emerging intelligent computing technology and applications
A parallel combinatorial algorithm for subtle motifs

International Journal of Bioinformatics Research and Applications
Deterministic local alignment methods improved by a simple genetic algorithm

Neurocomputing
Comparison of optimization techniques for sequence pattern discovery by maximum-likelihood

Pattern Recognition Letters
CUDA-MEME: Accelerating motif discovery in biological sequences using CUDA-enabled graphics processing units

Pattern Recognition Letters
Discovery of non-induced patterns from sequences

PRIB'10 Proceedings of the 5th IAPR international conference on Pattern recognition in bioinformatics
iGAPK: improved GAPK algorithm for regulatory DNA motif discovery

ICONIP'10 Proceedings of the 17th international conference on Neural information processing: models and applications - Volume Part II
SOMIX: motifs discovery in gene regulatory sequences using self-organizing maps

ICONIP'10 Proceedings of the 17th international conference on Neural information processing: models and applications - Volume Part II
Planted (l,d) motif finding with allowable mismatches using kernel based approach

Proceedings of the 2011 International Conference on Communication, Computing & Security
Theory and Use of the EM Algorithm

Foundations and Trends in Signal Processing
Predicting MHC-II Binding Affinity Using Multiple Instance Regression

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Finding motifs in DNA sequences applying a multiobjective artificial bee colony (MOABC) algorithm

EvoBIO'11 Proceedings of the 9th European conference on Evolutionary computation, machine learning and data mining in bioinformatics
A method for mixed states texture segmentation with simultaneous parameter estimation

Pattern Recognition Letters
A cost-aggregating integer linear program for motif finding

Journal of Discrete Algorithms
Computational identification of regulatory factors involved in MicroRNA transcription

PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
A mixture model based markov random field for discovering patterns in sequences

SETN'06 Proceedings of the 4th Helenic conference on Advances in Artificial Intelligence
A compact mathematical programming formulation for DNA motif finding

CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
An upper bound on the hardness of exact matrix based motif discovery

CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
Flexible pattern discovery with (extended) disjunctive logic programming

ISMIS'05 Proceedings of the 15th international conference on Foundations of Intelligent Systems
Generalized planted (l,d)-motif problem with negative set

WABI'05 Proceedings of the 5th International conference on Algorithms in Bioinformatics
Randomized algorithms for motif detection

ISAAC'04 Proceedings of the 15th international conference on Algorithms and Computation
Parameter landscape analysis for common motif discovery programs

RRG'04 Proceedings of the 2004 RECOMB international conference on Regulatory Genomics
Brief Communication: Detection of potential positive regulatory motifs of transcription in yeast introns by comparative analysis of oligonucleotide frequencies

Computational Biology and Chemistry
MoDEL: an efficient strategy for ungapped local multiple alignment

Computational Biology and Chemistry
Hierarchical Motif Vectors for Prediction of Functional Sites in Amino Acid Sequences Using Quasi-Supervised Learning

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Memetic algorithms for de novo motif-finding in biomedical sequences

Artificial Intelligence in Medicine
Identification of distinguishing motifs

CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Comparing multiobjective swarm intelligence metaheuristics for DNA motif discovery

Engineering Applications of Artificial Intelligence
Aligning discovered patterns from protein family sequences

PRIB'12 Proceedings of the 7th IAPR international conference on Pattern Recognition in Bioinformatics
An improved voting algorithm for planted (l,d) motif search

Information Sciences: an International Journal
Parallelizing a hybrid multiobjective differential evolution for identifying cis-regulatory elements

Proceedings of the 20th European MPI Users' Group Meeting
Finding recurrent patterns from continuous sign language sentences for automated extraction of signs

The Journal of Machine Learning Research
Analysing the scalability of multiobjective evolutionary algorithms when solving the motif discovery problem

Journal of Global Optimization
A parallel cooperative team of multiobjective evolutionary algorithms for motif discovery

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The MEME algorithm extends the expectation maximization (EM) algorithm for identifying motifs in unaligned biopolymer sequences. The aim of MEME is to discover new motifs in a set of biopolymer sequences where little or nothing is known in advance about any motifs that may be present. MEME innovations expand the range of problems which can be solved using EM and increase the chance of finding good solutions. First, subsequences which actually occur in the biopolymer sequences are used as starting points for the EM algorithm to increase the probability of finding globally optimal motifs. Second, the assumption that each sequence contains exactly one occurrence of the shared motif is removed. This allows multiple appearances of a motif to occur in any sequence and permits the algorithm to ignore sequences with no appearance of the shared motif, increasing its resistance to noisy data. Third, a method for probabilistically erasing shared motifs after they are found is incorporated so that several distinct motifs can be found in the same set of sequences, both when different motifs appear in different sequences and when a single sequence may contain multiple motifs. Experiments show that MEME can discover both the CRP and LexA binding sites from a set of sequences which contain one or both sites, and that MEME can discover both the −10 and −35 promoter regions in a set of E. coli sequences.