Automated Protein Classification Using Consensus Decision

Authors:
Tolga Can;Orhan Camoglu;Ambuj K. Singh;Yuan-Fang Wang
Affiliations:
University of California at Santa Barbara;University of California at Santa Barbara;University of California at Santa Barbara;University of California at Santa Barbara
Venue:
CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
Year:
2004

Citing 3
Cited 1

Improved Boosting Algorithms Using Confidence-rated Predictions

Machine Learning - The Eleventh Annual Conference on computational Learning Theory
An introduction to boosting and leveraging

Advanced lectures on machine learning
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)

HMM Approach for Classifying Protein Structures

FGIT '09 Proceedings of the 1st International Conference on Future Generation Information Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a novel technique for automatically generating the SCOP classification of a protein structure with high accuracy. High accuracy is achieved by combining the decisions of multiple methods using the consensus of a committee (or an ensemble) classifier. Our technique is rooted in machine learning which shows that by judicially employing component classifiers, an ensemble classifier can be constructed to outperform its components. We use two sequence- and three structure-comparison tools as component classifiers. Given a protein structure, using the joint hypothesis, we first determine if the protein belongs to an existing category (family, superfamily, fold) in the SCOP hierarchy. For the proteins that are predicted as members of the existing categories, we compute their family-, superfamily-, and fold-level classifications using the consensus classifier. We show that we can significantly improve the classification accuracy compared to the individual component classifiers. In particular, we achieve error rates that are 3-12 times less than the individual classifiersý error rates at the family level, 1.5-4.5 times less at the superfamily level, and 1.1-2.4 times less at the fold level.