A Time Series Approach for Identification of Exons and Introns

  • Authors:
  • Ravi Gupta;Ankush Mittal;Kuldip Singh;Prateek Bajpai;Suraj Prakash

  • Affiliations:
  • -;-;-;-;-

  • Venue:
  • ICIT '07 Proceedings of the 10th International Conference on Information Technology
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The classification of an organism gene sequence into coding and non-coding regions is a challenging task in DNA sequence analysis. The classification algorithms operate on the basic assumptions that every protein coding regions should have some distinct sequence features or properties that can distinguish it from the surrounding regions, such as non-coding regions and intergenic regions. In this study, we present a novel and generic approach for analysis of DNA sequences. A wavelet based time series approach is proposed for extracting statistical information from DNA sequences. The extracted information contains the variance information of amino/keto, purine/pyrimidine and weak/strong hydrogen bond distribution in a DNA se- quence. The variance information is further used to con- struct a feature vector and a pattern recognition framework is applied for classifying exons and introns. An optimized support vector machine (SVM) classifier based on novel fea- tures is constructed for accurate classification of DNA se- quences. Experiments were performed on exons and introns dataset of Homo sapiens and a 10-fold cross-validation ac- curacy of 87.5% was achieved. Further, test conducted were also conducted on unseen dataset of exons and introns of Homo sapiens and an accuracy of 88.95% was reported.