Towards automatic detecting of overlapping genes - clustered BLAST analysis of viral genomes

  • Authors:
  • Klaus Neuhaus;Daniela Oelke;David Fürst;Siegfried Scherer;Daniel A. Keim

  • Affiliations:
  • Chair of Microbial Ecology, Technische Universität München, Freising, Germany;Chair of Data Analysis and Visualization, Universität Konstanz, Konstanz, Germany;Chair of Data Management and Data Exploration, Rheinisch-Westfälische, Technische Hochschule Aachen, Aachen, Germany;Chair of Microbial Ecology, Technische Universität München, Freising, Germany;Chair of Data Analysis and Visualization, Universität Konstanz, Konstanz, Germany

  • Venue:
  • EvoBIO'10 Proceedings of the 8th European conference on Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Overlapping genes (encoded on the same DNA locus but in different frames) are thought to be rare and, therefore, were largely neglected in the past. In a test set of 800 viruses we found more than 350 potential overlapping open reading frames of 500 bp which generate BLAST hits, indicating a possible biological function. Interestingly, five overlaps with more than 2000 bp were found, the largest may even contain triple overlaps. In order to perform the vast amount of BLAST searches required to test all detected open reading frames, we compared two clustering strategies (BLASTCLUST and k-means) and queried the database with one representative only. Our results show that this approach achieves a significant speed-up while retaining a high quality of the results (99% precision compared to single queries) for both clustering methods. Future wet lab experiments are needed to show whether the detected overlapping reading frames are biologically functional.