Boolean Feature Discovery in Empirical Learning
Machine Learning
The nature of statistical learning theory
The nature of statistical learning theory
Machine Learning
Machine Learning
Generating Accurate Rule Sets Without Global Optimization
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A decision-theoretic generalization of on-line learning and an application to boosting
EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
Inference for the Generalization Error
Machine Learning
Multi-class protein fold recognition using adaptive codes
ICML '05 Proceedings of the 22nd international conference on Machine learning
Ensemble classifier for protein fold pattern recognition
Bioinformatics
Collective entity resolution in relational data
ACM Transactions on Knowledge Discovery from Data (TKDD)
Prediction of Ordinal Classes Using Regression Trees
Fundamenta Informaticae - Intelligent Systems
Learning on the border: active learning in imbalanced data classification
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Bioinformatics
Bioinformatics
ICANN/ICONIP'03 Proceedings of the 2003 joint international conference on Artificial neural networks and neural information processing
Computational Biology and Chemistry
Structural SCOP Superfamily Level Classification Using Unsupervised Machine Learning
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Protein fold recognition based on functional domain composition
Computational Biology and Chemistry
Hi-index | 0.00 |
We explore automation of protein structural classification using supervised machine learning methods on a set of 11,360 pairs of protein domains (up to 35% sequence identity) consisting of three secondary structure elements. Fifteen algorithms from five categories of supervised algorithms are evaluated for their ability to learn for a pair of protein domains, the deepest common structural level within the SCOP hierarchy, given a one-dimensional representation of the domain structures. This representation encapsulates evolutionary information in terms of sequence identity and structural information characterising the secondary structure elements and lengths of the respective domains. The evaluation is performed in two steps, first selecting the best performing base learners and subsequently evaluating boosted and bagged meta learners. The boosted random forest, a collection of decision trees, is found to be the most accurate, with a cross-validated accuracy of 97.0% and F-measures of 0.97, 0.85, 0.93 and 0.98 for classification of proteins to the Class, Fold, Super-Family and Family levels in the SCOP hierarchy. The meta learning regime, especially boosting, improved performance by more accurately classifying the instances from less populated classes.