GAPK: genetic algorithms with prior knowledge for motif discovery in DNA sequences

  • Authors:
  • Dianhui Wang;Xi Li

  • Affiliations:
  • Department of Computer Science and Computer Engineering, La Trobe University, Melbourne, Victoria, Australia;Department of Computer Science and Computer Engineering, La Trobe Univ., Melbourne, Victoria, Australia and Department of Primary Industries, Bioscience Research Division, Victorian AgriBioscience ...

  • Venue:
  • CEC'09 Proceedings of the Eleventh conference on Congress on Evolutionary Computation
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Discovery of transcription factor binding sites (TFBSs) or DNA motifs in promoter regions of genes plays a key role in understanding the regulations of gene expression. In the past decade computational approaches, including evolutionary computation techniques, for searching for motifs have demonstrated good potential, and some results reported in literature are quite promising. Recently, some favorable progresses on evolutionary mining of motifs have been made and documented in GAME and GALF-P, where GAME employs a Bayesian-based scoring function and GALF-P aims to improve the algorithm performance with local filtering and adaptive post-processing. To improve discovering performance in terms of the recall, precision rates and algorithm reliability, this paper presents an alternative genetic algorithm termed as GAPK for resolving the problem of motifs discovery. In our proposed GAPK framework, a prior knowledge on motifs in a given dataset is used to initialize a population. Our technical contributions include a matrix representation for k-mers, a mismatch-based filtering method for search space reduction, a model mismatch score (MMS) as fitness function, new genetic operations and a model refinement processing. Some benchmarked datasets associated with eight transcription factors are used in our experiments. Comparative studies were carried out with well-known tools including GAME, GALF-P, MEME, MDScan and AlignACE. Results show that our method outperforms other techniques in terms of F-measure.