Comparison of simple encoding schemes in GA's for the motif finding problem: preliminary results

  • Authors:
  • Giovanna Martínez-Arellano;Carlos A. Brizuela

  • Affiliations:
  • Computer Sciences Department, CICESE Research Center, Ensenada, B.C., México;Computer Sciences Department, CICESE Research Center, Ensenada, B.C., México

  • Venue:
  • BSB'07 Proceedings of the 2nd Brazilian conference on Advances in bioinformatics and computational biology
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The DNA motif finding problem is of great relevance in molecular biology. Weak signals that mark transcription factor binding sites involved in gene regulation are considered to be challenging to find. These signals (motifs) consist of a short string of unknown length that can be located anywhere in the gene promoter region. Therefore, the problem consists on discovering short, conserved sites in genomic DNA without knowing, a priori, the length nor the chemical composition of the site, turning the original problem into a combinatorial one, where computational tools can be applied to find the solution. Pevzner and Sze [7], studied a precise combinatorial formulation of this problem, called the planted motif problem, which is of particular interest because it is a challenging model for commonly used motif-finding algorithms [15]. In this work, we analyze two different encoding schemes for genetic algorithms to solve the planted motif finding problem. One representation encodes the initial position for the motif occurrences at each sequence, and the other encodes a candidate motif. We test the performance of both algorithms on a set of planted motif instances. Preliminary experimental results show a promising superior performance of the algorithm encoding the candidate motif over the more standard position based scheme.