A Study of Hierarchical and Flat Classification of Proteins

Authors:
Arthur Zimek;Fabian Buchwald;Eibe Frank;Stefan Kramer
Affiliations:
Ludwig-Maximilians-Universitaet Muenchen and Forschungseinheit fuer Datenbanksysteme, Muenchen;Technische Universitaet Muenchen, Muenchen;University of Waikato, Hamilton;Technische Universitaet Muenchen, Muenchen
Venue:
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Year:
2010

Citing 27
Cited 3

Feature selection, perceptron learning, and a usability case study for text categorization

Proceedings of the 20th annual international ACM SIGIR conference on Research and development in information retrieval
Fast training of support vector machines using sequential minimal optimization

Advances in kernel methods
Classification by pairwise coupling

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
Hierarchical classification of Web content

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Exploiting Hierarchy in Text Categorization

Information Retrieval
Hierarchical Text Categorization Using Neural Networks

Information Retrieval
Hierarchically Classifying Documents Using Very Few Words

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Improving Text Classification by Shrinkage in a Hierarchy of Classes

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Personalized Classification for Keyword-Based Category Profiles

ECDL '02 Proceedings of the 6th European Conference on Research and Advanced Technology for Digital Libraries
Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies

The VLDB Journal — The International Journal on Very Large Data Bases
Ensembles of nested dichotomies for multi-class problems

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Large margin hierarchical classification

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Hierarchical document categorization with support vector machines

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Mismatch string kernels for discriminative protein classification

Bioinformatics
Protein homology detection using string alignment kernels

Bioinformatics
Multi-class protein fold recognition using adaptive codes

ICML '05 Proceedings of the 22nd international conference on Machine learning
Learning hierarchical multi-category text classification models

ICML '05 Proceedings of the 22nd international conference on Machine learning
A Multi-Level Approach to SCOP Fold Recognition

BIBE '05 Proceedings of the Fifth IEEE Symposium on Bioinformatics and Bioengineering
Profile-based direct kernels for remote homology detection and fold recognition

Bioinformatics
Ensemble classifier for protein fold pattern recognition

Bioinformatics
Biochemistry

Biochemistry
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Hierarchical text categorization and its application to bioinformatics

Hierarchical text categorization and its application to bioinformatics
Multi-class Protein Classification Using Adaptive Codes

The Journal of Machine Learning Research
Solving multiclass learning problems via error-correcting output codes

Journal of Artificial Intelligence Research
Recognition of structure classification of protein folding by NN and SVM hierarchical learning architecture

ICANN/ICONIP'03 Proceedings of the 2003 joint international conference on Artificial neural networks and neural information processing
Machine learning for multi-class protein fold classification based on neural networks with feature gating

ICANN/ICONIP'03 Proceedings of the 2003 joint international conference on Artificial neural networks and neural information processing

On exploiting hierarchical label structure with pairwise classifiers

ACM SIGKDD Explorations Newsletter
Large Margin Hierarchical Classification with Mutually Exclusive Class Membership

The Journal of Machine Learning Research
A semantic image classifier based on hierarchical fuzzy association rule mining

Multimedia Tools and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic classification of proteins using machine learning is an important problem that has received significant attention in the literature. One feature of this problem is that expert-defined hierarchies of protein classes exist and can potentially be exploited to improve classification performance. In this article, we investigate empirically whether this is the case for two such hierarchies. We compare multiclass classification techniques that exploit the information in those class hierarchies and those that do not, using logistic regression, decision trees, bagged decision trees, and support vector machines as the underlying base learners. In particular, we compare hierarchical and flat variants of ensembles of nested dichotomies. The latter have been shown to deliver strong classification performance in multiclass settings. We present experimental results for synthetic, fold recognition, enzyme classification, and remote homology detection data. Our results show that exploiting the class hierarchy improves performance on the synthetic data but not in the case of the protein classification problems. Based on this, we recommend that strong flat multiclass methods be used as a baseline to establish the benefit of exploiting class hierarchies in this area.