A Kernel Framework for Protein Residue Annotation

Authors:
Huzefa Rangwala;Christopher Kauffman;George Karypis
Affiliations:
George Mason University, Fairfax, USA VA 22030;University of Minnesota, Minneapolis, USA MN 55414;University of Minnesota, Minneapolis, USA MN 55414
Venue:
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Year:
2009

Citing 9
Cited 0

The nature of statistical learning theory

The nature of statistical learning theory
Making large-scale support vector machine learning practical

Advances in kernel methods
Multiple kernel learning, conic duality, and the SMO algorithm

ICML '04 Proceedings of the twenty-first international conference on Machine learning
IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content

Bioinformatics
Accurate Prediction of Protein Disordered Regions by Mining Protein Structure Data

Data Mining and Knowledge Discovery
Improving the accuracy of transmembrane protein topology prediction using evolutionary information

Bioinformatics
Prediction of DNA-binding residues from sequence

Bioinformatics
POODLE-L

Bioinformatics
TOPTMH: Topology Predictor for Transmembrane α-Helices

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Over the last decade several prediction methods have been developed for determining structural and functional properties of individual protein residues using sequence and sequence-derived information. Most of these methods are based on support vector machines as they provide accurate and generalizable prediction models. We developed a general purpose protein residue annotation toolkit (Pro SAT ) to allow biologists to formulate residue-wise prediction problems. Pro SAT formulates annotation problem as a classification or regression problem using support vector machines. For every residue Pro SAT captures local information (any sequence-derived information) around the reside to create fixed length feature vectors. Pro SAT implements accurate and fast kernel functions, and also introduces a flexible window-based encoding scheme that allows better capture of signals for certain prediction problems. In this work we evaluate the performance of Pro SAT on the disorder prediction and contact order estimation problems, studying the effect of the different kernels introduced here. Pro SAT shows better or at least comparable performance to state-of-the-art prediction systems. In particular Pro SAT has proven to be the best performing transmembrane-helix predictor on an independent blind benchmark. Availability: http://bio.dtc.umn.edu/prosat