Supervised machine learning algorithms for protein structure classification

Authors:
Pooja Jain;Jonathan M. Garibaldi;Jonathan D. Hirst
Affiliations:
School of Chemistry, The University of Nottingham, University Park, Nottingham, NG7 2RD, UK;School of Computer Science and IT, The University of Nottingham, Jubilee Campus, Nottingham, NG8 1BB, UK;School of Chemistry, The University of Nottingham, University Park, Nottingham, NG7 2RD, UK
Venue:
Computational Biology and Chemistry
Year:
2009

Citing 20
Cited 3

Boolean Feature Discovery in Empirical Learning

Machine Learning
Very Simple Classification Rules Perform Well on Most Commonly Used Datasets

Machine Learning
The nature of statistical learning theory

The nature of statistical learning theory
Bagging predictors

Machine Learning
Approximate statistical tests for comparing supervised classification learning algorithms

Neural Computation
Random Forests

Machine Learning
Generating Accurate Rule Sets Without Global Optimization

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting

EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
Inference for the Generalization Error

Machine Learning
Comparative evaluation of word composition distances for the recognition of SCOP relationships

Bioinformatics
Multi-class protein fold recognition using adaptive codes

ICML '05 Proceedings of the 22nd international conference on Machine learning
Ensemble classifier for protein fold pattern recognition

Bioinformatics
A machine learning information retrieval approach to protein fold recognition

Bioinformatics
Collective entity resolution in relational data

ACM Transactions on Knowledge Discovery from Data (TKDD)
Prediction of Ordinal Classes Using Regression Trees

Fundamenta Informaticae - Intelligent Systems
Learning on the border: active learning in imbalanced data classification

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
AutoSCOP

Bioinformatics
PFRES

Bioinformatics
Support Vector Machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs

Bioinformatics
Recognition of structure classification of protein folding by NN and SVM hierarchical learning architecture

ICANN/ICONIP'03 Proceedings of the 2003 joint international conference on Artificial neural networks and neural information processing

Research Article: A protein fold classifier formed by fusing different modes of pseudo amino acid composition via PSSM

Computational Biology and Chemistry
Structural SCOP Superfamily Level Classification Using Unsupervised Machine Learning

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Protein fold recognition based on functional domain composition

Computational Biology and Chemistry

Quantified Score

Hi-index	0.00

Visualization

Abstract

We explore automation of protein structural classification using supervised machine learning methods on a set of 11,360 pairs of protein domains (up to 35% sequence identity) consisting of three secondary structure elements. Fifteen algorithms from five categories of supervised algorithms are evaluated for their ability to learn for a pair of protein domains, the deepest common structural level within the SCOP hierarchy, given a one-dimensional representation of the domain structures. This representation encapsulates evolutionary information in terms of sequence identity and structural information characterising the secondary structure elements and lengths of the respective domains. The evaluation is performed in two steps, first selecting the best performing base learners and subsequently evaluating boosted and bagged meta learners. The boosted random forest, a collection of decision trees, is found to be the most accurate, with a cross-validated accuracy of 97.0% and F-measures of 0.97, 0.85, 0.93 and 0.98 for classification of proteins to the Class, Fold, Super-Family and Family levels in the SCOP hierarchy. The meta learning regime, especially boosting, improved performance by more accurately classifying the instances from less populated classes.