Gapped extension for local multiple alignment of interspersed DNA repeats

  • Authors:
  • Todd J. Treangen;Aaron E. Darling;Mark A. Ragan;Xavier Messeguer

  • Affiliations:
  • Dept. of Computer Science, Polytechnic University of Catalonia, Barcelona, Spain;ARC Centre of Excellence in Bioinformatics and Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia;ARC Centre of Excellence in Bioinformatics and Institute for Molecular Bioscience, The University of Queensland, Brisbane, Australia;Dept. of Computer Science, Polytechnic University of Catalonia, Barcelona, Spain

  • Venue:
  • ISBRA'08 Proceedings of the 4th international conference on Bioinformatics research and applications
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The identification of homologous DNA is a fundamental buildingblock of comparative genomic and molecular evolution studies. To date, pairwiselocal sequence alignment methods have been the prevailing technique to identifyhomologous nucleotides. However, existing methods that identify and align allhomologous nucleotides in one or more genomes have suffered poor scalabilityand limited accuracy.We propose a novel method that couples a gapped extensionheuristic with a previously described efficient filtration method for local multiplealignment. During gapped extension, we use the MUSCLE implementation ofprogressive multiple alignment with iterative refinement. The resulting gappedextensions potentially contain alignments of unrelated sequence. We detectand remove such undesirable alignments using a hidden Markov model topredict the posterior probability of homology. The HMM emission frequenciesfor nucleotide substitutions can be derived from any strand/species-symmetric nucleotide substitution matrix, and we have developed a method to adapt anarbitrary substitution matrix (i.e. HOXD) to organisms with different G+Ccontent. We evaluate the performance of our method and previous approacheson a hybrid dataset of real genomic DNA with simulated interspersed repeats.Our method outperforms existing methods in terms of sensitivity, positivepredictive value, and localizing boundaries of homology. The described methodshave been implemented in the free, open-source procrastAligner software,available from: http://alggen.lsi.upc.es/recerca/align/procrastination.