Fast Target Set Reduction for Large-Scale Protein Function Prediction: A Multi-class Multi-label Machine Learning Approach

Authors:
Thomas Lingner;Peter Meinicke
Affiliations:
Department of Bioinformatics, Institute for Microbiology and Genetics, University of Göttingen, Göttingen, Germany 37077;Department of Bioinformatics, Institute for Microbiology and Genetics, University of Göttingen, Göttingen, Germany 37077
Venue:
WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
Year:
2008

Citing 8
Cited 0

In Defense of One-Vs-All Classification

The Journal of Machine Learning Research
Mismatch string kernels for discriminative protein classification

Bioinformatics
Protein homology detection using string alignment kernels

Bioinformatics
Profile-based direct kernels for remote homology detection and fold recognition

Bioinformatics
Remote homology detection based on oligomer distances

Bioinformatics
MPI-HMMER-Boost: Distributed FPGA Acceleration

Journal of VLSI Signal Processing Systems
Learning from imbalanced data in surveillance of nosocomial infection

Artificial Intelligence in Medicine
Protein classification with multiple algorithms

PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large-scale sequencing projects have led to a vast amount of protein sequences, which have to be assigned to functional categories. Currently, profile hidden markov models and kernel-based machine learning methods provide the most accurate results for protein classification. However, the prediction of new sequences with these approaches is computationally expensive. We present an approach for fast scoring of protein sequences by means of feature-based protein sequence representation and multi-class multi-label machine learning techniques. Using the Pfam database, we show that our method provides high computational efficiency and that the approach is well-suitable for pre-filtering of large sequence sets.