MoDEL: an efficient strategy for ungapped local multiple alignment

Authors:
David Hernandez;Robin Gras;Ron Appel
Affiliations:
Proteome Informatics Group, Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH 1211 Geneva 4, Switzerland;Proteome Informatics Group, Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH 1211 Geneva 4, Switzerland;Proteome Informatics Group, Swiss Institute of Bioinformatics, CMU, 1 rue Michel-Servet, CH 1211 Geneva 4, Switzerland
Venue:
Computational Biology and Chemistry
Year:
2004

Citing 7
Cited 0

Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization

Machine Learning - Special issue on applications in molecular biology
On approximation algorithms for local multiple alignment

RECOMB '00 Proceedings of the fourth annual international conference on Computational molecular biology
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
The Design of Innovation: Lessons from and for Competent Genetic Algorithms

The Design of Innovation: Lessons from and for Competent Genetic Algorithms
Combinatorial Approaches to Finding Subtle Signals in DNA Sequences

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
A Statistical Method for Finding Transcription Factor Binding Sites

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Cooperative Metaheuristics for Exploring Proteomic Data

Artificial Intelligence Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

We introduce a method for ungapped local multiple alignment (ULMA) in a given set of amino acid or nucleotide sequences. This method explores two search spaces using a linked optimization strategy. The first search space M consists of all possible words of a given length W, defined on the residue alphabet. An evolutionary algorithm searches this space globally. The second search space P consists of all possible ULMAs in the sequence set, each ULMA being represented by a position vector defining exactly one subsequence of length W per sequence. This search space is sampled with hill-climbing processes. The search of both spaces are coupled by projecting high scoring results from the global evolutionary search of M onto P. The hill-climbing processes then refine the optimization by local search, using the relative entropy between the ULMA and background residue frequencies as an objective function. We demonstrate some advantages of our strategy by analyzing difficult natural amino acid sequences and artificial datasets. A web interface is available at http://idefix.univ-rennes1.fr:8080/PatternDiscovery/