The nature of statistical learning theory
The nature of statistical learning theory
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
Machine Learning - Special issue on learning with probabilistic representations
An introduction to support Vector Machines: and other kernel-based learning methods
An introduction to support Vector Machines: and other kernel-based learning methods
Protein Classification into Domains of Life Using Markov Chain Models
CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Species identification based on approximate matching
COMPUTE '11 Proceedings of the Fourth Annual ACM Bangalore Conference
Hi-index | 0.00 |
Classification of unknown genomes finds wide application in areas like evolutionary studies, bio-diversity researches and forensic studies which are viewed in a renewed 'genomic' perspective, lately. Only a few attempts are seen in literature focusing on unknown genome identification, and the reported accuracies are not more than 85%. Most works report classification into the major kingdoms only, not venturing further into their sub-classes. A novel combined technique of Chaos Game Representation (CGR) and machine learning is proposed, the former for feature extraction and the latter for subsequent sequence classification. Eight sub categories of eukaryotic mitochondrial genomes from NCBI are used for the study. The sequences are initially mapped into their Chaos Game Representation format. Genomic feature extraction is implemented by computing the Frequency Chaos Game Representation (FCGR) matrix. An order 3 FCGR matrix is considered here, which consists of 64 elements. The 64 element matrix acts as the feature descriptor for classification. The classification methods used are Difference Boosting Naïve Bayesian (DBNB) based method, Artificial Neural Network (ANN) based and Support Vector Machine (SVM) based methods. Accuracies of individual methods are reported. Although the average accuracy is seen highest for the SVM-CGR combination, better accuracies are seen for some categories in other methods too. Hence a voting classifier is implemented combining all the three methods. Accuracies of 100% were obtained for Vertebrata and Porifera whereas Acoelomata, Cnidaria and Fungi were classified with accuracies above 90%. The accuracies obtained for Protostomia, Plant, and Pseudocoelomata were respectively 90, 82 and 77%.