Signature recognition methods for identifying influenza sequences

  • Authors:
  • Jitimon Keinduangjun;Punpiti Piamsa-nga;Yong Poovorawan

  • Affiliations:
  • Department of Computer Engineering, Faculty of Engineering, Kasetsart University, Bangkok, Thailand;Department of Computer Engineering, Faculty of Engineering, Kasetsart University, Bangkok, Thailand;Department of Pediatrics, Faculty of Medicine, Chulalongkorn University, Bangkok, Thailand

  • Venue:
  • AIME'05 Proceedings of the 10th conference on Artificial Intelligence in Medicine
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Basically, one of the most important issues for identifying biological sequences is accuracy; however, since the exponential growth and excessive diversity of biological data, the requirement to compute within considerably appropriate time usually compromises with accuracy. We propose novel approaches for accurately identifying DNA sequences in shorter time by discovering sequence patterns – signatures, which are enough distinctive information for the sequence identification. The approaches are to find the best combination of n-gram patterns and six statistical scoring algorithms, which are regularly used in the research of Information Retrieval, and then employ the signatures to create a similarity scoring model for identifying the DNA. We generate two approaches to discover the signatures. For the first one, we use only statistical information extracted directly from the sequences to discover the signatures. For the second one, we use prior knowledge of the DNA in the signature discovery process. From our experiments on influenza virus, we found that: 1) our technique can identify the influenza virus at the accuracy of up to 99.69% when 11-gram is used and the prior knowledge is applied; 2) the use of too short or too long signatures produces lower efficiency; and 3) most scoring algorithms are good for identification except the “Rocchio algorithm” where its results are approximately 9% lower than the others. Moreover, this technique can be applied for identifying other organisms.