Deterministic local alignment methods improved by a simple genetic algorithm

  • Authors:
  • Chengpeng Bi

  • Affiliations:
  • Bioinformatics and Intelligent Computing Lab, Division of Clinical Pharmacology, Children's Mercy Hospitals, Schools of Medicine, Computing and Engineering, University of Missouri, Kansas City, MO ...

  • Venue:
  • Neurocomputing
  • Year:
  • 2010

Quantified Score

Hi-index 0.01

Visualization

Abstract

Multiple sequence local alignment, often deployed for de novo discovery of biological motifs hidden in a set of DNA or protein sequences, remains a challenge in bioinformatics and computational biology. Many algorithms and software packages have been developed to address the problem. Expectation maximization (EM), one of the popular local alignment methods, is often used to solve the motif-finding problem. However, EM largely depends on its initialization and can be easily trapped in local optima. This paper presents the Genetic-enabled EM Motif-Finding Algorithm (GEMFA) in an effort to mitigate the difficulties confronted the EM-based motif discovery algorithms. The new algorithm integrates a simple genetic algorithm (GA) with a local searcher to explore the local alignment space, that is, it combines deterministic local alignment methods with a simple GA to effectively perform de novo motif discovery. It first initializes a population of multiple local alignments each of which is encoded on a chromosome that represents a potential solution. GEMFA then performs heuristic search in the whole alignment space using minimum distance length (MDL) as the fitness function, which is generalized from maximum log-likelihood. The genetic algorithm gradually moves this population towards the best alignment from which the motif model is derived. Simulated and real biological sequence analysis showed that GEMFA significantly improved deterministic local alignment methods especially in the subtle motif sequence alignment, and it also outperformed other algorithms tested.