An integrated statistical comparative analysis between variant genetic datasets of Mus musculus
International Journal of Computational Intelligence in Bioinformatics and Systems Biology
Wavelet Analysis in Current Cancer Genome Research: A Survey
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Hi-index | 0.00 |
The classification of an organism gene sequence into coding and non-coding regions is a challenging task in DNA sequence analysis. The classification algorithms operate on the basic assumptions that every protein coding regions should have some distinct sequence features or properties that can distinguish it from the surrounding regions, such as non-coding regions and intergenic regions. In this study, we present a novel and generic approach for analysis of DNA sequences. A wavelet based time series approach is proposed for extracting statistical information from DNA sequences. The extracted information contains the variance information of amino/keto, purine/pyrimidine and weak/strong hydrogen bond distribution in a DNA se- quence. The variance information is further used to con- struct a feature vector and a pattern recognition framework is applied for classifying exons and introns. An optimized support vector machine (SVM) classifier based on novel fea- tures is constructed for accurate classification of DNA se- quences. Experiments were performed on exons and introns dataset of Homo sapiens and a 10-fold cross-validation ac- curacy of 87.5% was achieved. Further, test conducted were also conducted on unseen dataset of exons and introns of Homo sapiens and an accuracy of 88.95% was reported.