Margin-based ensemble classifier for protein fold recognition

Authors:
Tao Yang;Vojislav Kecman;Longbing Cao;Chengqi Zhang;Joshua Zhexue Huang
Affiliations:
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Boulevard, Xili Nanshan, Shenzhen 518055, China;Department of Computer Science, Virginia Commonwealth University, 401 West Main, Richmond, VA, USA;Faculty of Engineering and Information Technology, University of Technology Sydney, 15 Broadway, Ultimo, NSW 2007, Australia;Faculty of Engineering and Information Technology, University of Technology Sydney, 15 Broadway, Ultimo, NSW 2007, Australia;Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Boulevard, Xili Nanshan, Shenzhen 518055, China
Venue:
Expert Systems with Applications: An International Journal
Year:
2011

Citing 13
Cited 4

Using the Fisher Kernel Method to Detect Remote Protein Homologies

Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology
Protein homology detection by HMM--HMM comparison

Bioinformatics
Fold recognition by combining profile--profile alignment and support vector machine

Bioinformatics
Ensemble classifier for protein fold pattern recognition

Bioinformatics
PFRES

Bioinformatics
Support Vector Machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs

Bioinformatics
Probabilistic multi-class multi-kernel learning

Bioinformatics
Letters: Adaptive local hyperplane classification

Neurocomputing
Protein fold recognition with adaptive local hyperplane algorithm

CIBCB'09 Proceedings of the 6th Annual IEEE conference on Computational Intelligence in Bioinformatics and Computational Biology
Face recognition with adaptive local hyperplane algorithm

Pattern Analysis & Applications
Letters: Fusion of classifiers for protein fold recognition

Neurocomputing
Letters: Ensemble of classifiers for protein fold recognition

Neurocomputing
Recognition of structure classification of protein folding by NN and SVM hierarchical learning architecture

ICANN/ICONIP'03 Proceedings of the 2003 joint international conference on Artificial neural networks and neural information processing

A generic classifier-ensemble approach for biomedical named entity recognition

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Ensemble of diversely trained support vector machines for protein fold recognition

ACIIDS'13 Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part I
Enhancing protein fold prediction accuracy using evolutionary and structural features

PRIB'13 Proceedings of the 8th IAPR international conference on Pattern Recognition in Bioinformatics
A survey of multiple classifier systems as hybrid systems

Information Fusion

Quantified Score

Hi-index	12.05

Visualization

Abstract

Recognition of protein folding patterns is an important step in protein structure and function predictions. Traditional sequence similarity-based approach fails to yield convincing predictions when proteins have low sequence identities, while the taxonometric approach is a reliable alternative. From a pattern recognition perspective, protein fold recognition involves a large number of classes with only a small number of training samples, and multiple heterogeneous feature groups derived from different propensities of amino acids. This raises the need for a classification method that is able to handle the data complexity with a high prediction accuracy for practical applications. To this end, a novel ensemble classifier, called MarFold, is proposed in this paper which combines three margin-based classifiers for protein fold recognition. The effectiveness of our method is demonstrated with the benchmark D-B dataset with 27 classes. The overall prediction accuracy obtained by MarFold is 71.7%, which surpasses the existing fold recognition methods by 3.1-15.7%. Moreover, one component classifier for MarFold, called ALH, has obtained a prediction accuracy of 65.5%, which is 4.7-9.5% higher than the prediction accuracies for the published methods using single classifiers. Additionally, the feature set of pairwise frequency information about the amino acids, which is adopted by MarFold, is found to be important for discriminating folding patterns. These results imply that the MarFold method and its operation engine ALH might become useful vehicles for protein fold recognition, as well as other bioinformatics tasks. The MarFold method and the datasets can be obtained from: (http://www-staff.it.uts.edu.au/~lbcao/publication/MarFold.7z).