Using Fourier phase analysis on genomic sequences to identify retroviruses

  • Authors:
  • Wendy Ashlock;Suprakash Datta

  • Affiliations:
  • York University, Toronto, Ontario, Canada;York University, Toronto, Ontario, Canada

  • Venue:
  • Proceedings of the First ACM International Conference on Bioinformatics and Computational Biology
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Retroviruses are of great importance due to their associations with diseases and their significance in understanding the evolution of species. In this paper we study the problem of classifying unknown DNA sequence fragments as retroviruses, genes or non-coding DNA sequences. We use a novel set of features generated from the Fourier transform at frequency 1/3 that are based on the amounts of randomness in sequences from these three classes and on their use of the three different possible reading frames. Our features can be computed efficiently and are used to train a random forest. It is shown that these three groups can be distinguished with high ( 90%) accuracy.