Gene Classification Using Codon Usage and Support Vector Machines

Authors:
Jianmin Ma;Minh N. Nguyen;Jagath C. Rajapakse
Affiliations:
-;-;-
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2009

Citing 10
Cited 5

Instance-Based Learning Algorithms

Machine Learning
The nature of statistical learning theory

The nature of statistical learning theory
A fast fixed-point algorithm for independent component analysis

Neural Computation
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Data mining: concepts and techniques

Data mining: concepts and techniques
On the Learnability and Design of Output Codes for Multiclass Problems

Machine Learning
SVM based method for predicting HLA-DRB1*0401 binding peptides in an antigen sequence

Bioinformatics
Classification of bacterial species from proteomic data using combinatorial approaches incorporating artificial neural networks, cluster analysis and principal components analysis

Bioinformatics
A comparison of methods for multiclass support vector machines

IEEE Transactions on Neural Networks

Di-codon Usage for Gene Classification

PRIB '09 Proceedings of the 4th IAPR International Conference on Pattern Recognition in Bioinformatics
Feature selection and parameter optimization for support vector machines: A new approach based on genetic algorithm with feature chromosomes

Expert Systems with Applications: An International Journal
Conotoxin protein classification using pairwise comparison and amino acid composition: toxin-aam

Proceedings of the 13th annual conference on Genetic and evolutionary computation
An efficient classification approach for large-scale mobile ubiquitous computing

Information Sciences: an International Journal
Identification of bacillus species using support vector machine and codon pair relative frequency

Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Abstract-- A novel approach for gene classification, which adopts codon usage bias as input feature vector for classification by support vector machines (SVM) is proposed. The DNA sequence is first converted to a 59-dimensional feature vector where each element corresponds to the relative synonymous usage frequency of a codon. As the input to the classifier is independent of sequence length and variance, our approach is useful when the sequences to be classified are of different lengths, a condition that homology-based methods tend to fail. The method is demonstrated by using 1,841 Human Leukocyte Antigen (HLA) sequences which are classified into two major classes: HLA-I and HLA-II; each major class is further subdivided into sub-groups of HLA-I and HLA-II molecules. Using codon usage frequencies, binary SVM achieved accuracy rate of 99.3% for HLA major class classification and multi-class SVM achieved accuracy rates of 99.73% and 98.38% for sub-class classification of HLA-I and HLA-II molecules, respectively. The results show that gene classification based on codon usage bias is consistent with the molecular structures and biological functions of HLA molecules.