Comparison of simple encoding schemes in GA's for the motif finding problem: preliminary results

Authors:
Giovanna Martínez-Arellano;Carlos A. Brizuela
Affiliations:
Computer Sciences Department, CICESE Research Center, Ensenada, B.C., México;Computer Sciences Department, CICESE Research Center, Ensenada, B.C., México
Venue:
BSB'07 Proceedings of the 2nd Brazilian conference on Advances in bioinformatics and computational biology
Year:
2007

Citing 8
Cited 0

Unsupervised Learning of Multiple Motifs in Biopolymers Using Expectation Maximization

Machine Learning - Special issue on applications in molecular biology
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Genetic Algorithms in Search, Optimization and Machine Learning

Genetic Algorithms in Search, Optimization and Machine Learning
Combinatorial Approaches to Finding Subtle Signals in DNA Sequences

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
A Statistical Method for Finding Transcription Factor Binding Sites

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Spelling Approximate Repeated or Common Motifs Using a Suffix Tree

LATIN '98 Proceedings of the Third Latin American Symposium on Theoretical Informatics
FMGA: Finding Motifs by Genetic Algorithm

BIBE '04 Proceedings of the 4th IEEE Symposium on Bioinformatics and Bioengineering
MDGA: motif discovery using a genetic algorithm

GECCO '05 Proceedings of the 7th annual conference on Genetic and evolutionary computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

The DNA motif finding problem is of great relevance in molecular biology. Weak signals that mark transcription factor binding sites involved in gene regulation are considered to be challenging to find. These signals (motifs) consist of a short string of unknown length that can be located anywhere in the gene promoter region. Therefore, the problem consists on discovering short, conserved sites in genomic DNA without knowing, a priori, the length nor the chemical composition of the site, turning the original problem into a combinatorial one, where computational tools can be applied to find the solution. Pevzner and Sze [7], studied a precise combinatorial formulation of this problem, called the planted motif problem, which is of particular interest because it is a challenging model for commonly used motif-finding algorithms [15]. In this work, we analyze two different encoding schemes for genetic algorithms to solve the planted motif finding problem. One representation encodes the initial position for the motif occurrences at each sequence, and the other encodes a candidate motif. We test the performance of both algorithms on a set of planted motif instances. Preliminary experimental results show a promising superior performance of the algorithm encoding the candidate motif over the more standard position based scheme.