Memetic algorithms for de novo motif-finding in biomedical sequences

  • Authors:
  • Chengpeng Bi

  • Affiliations:
  • Bioinformatics and Intelligent Computing Lab, Division of Clinical Pharmacology, Children's Mercy Hospitals and Clinics, Kansas City, MO 64108, USA and School of Medicine, School of Computing and ...

  • Venue:
  • Artificial Intelligence in Medicine
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Objectives: The objectives of this study are to design and implement a new memetic algorithm for de novo motif discovery, which is then applied to detect important signals hidden in various biomedical molecular sequences. Methods and materials: In this paper, memetic algorithms are developed and tested in de novo motif-finding problems. Several strategies in the algorithm design are employed that are to not only efficiently explore the multiple sequence local alignment space, but also effectively uncover the molecular signals. As a result, there are a number of key features in the implementation of the memetic motif-finding algorithm (MaMotif), including a chromosome replacement operator, a chromosome alteration-aware local search operator, a truncated local search strategy, and a stochastic operation of local search imposed on individual learning. To test the new algorithm, we compare MaMotif with a few of other similar algorithms using simulated and experimental data including genomic DNA, primary microRNA sequences (let-7 family), and transmembrane protein sequences. Results: The new memetic motif-finding algorithm is successfully implemented in C++, and exhaustively tested with various simulated and real biological sequences. In the simulation, it shows that MaMotif is the most time-efficient algorithm compared with others, that is, it runs 2 times faster than the expectation maximization (EM) method and 16 times faster than the genetic algorithm-based EM hybrid. In both simulated and experimental testing, results show that the new algorithm is compared favorably or superior to other algorithms. Notably, MaMotif is able to successfully discover the transcription factors' binding sites in the chromatin immunoprecipitation followed by massively parallel sequencing (ChIP-Seq) data, correctly uncover the RNA splicing signals in gene expression, and precisely find the highly conserved helix motif in the transmembrane protein sequences, as well as rightly detect the palindromic segments in the primary microRNA sequences. Conclusions: The memetic motif-finding algorithm is effectively designed and implemented, and its applications demonstrate it is not only time-efficient, but also exhibits excellent performance while compared with other popular algorithms.