Probabilistic prediction of protein phosphorylation sites using classification relevance units machines

Authors:
Mark Menor;Kyungim Baek;Guylaine Poisson
Affiliations:
University of Hawai'I at Mānoa, Honolulu, HI;University of Hawai'I at Mānoa, Honolulu, HI;University of Hawai'I at Mānoa, Honolulu, HI
Venue:
ACM SIGAPP Applied Computing Review
Year:
2012

Citing 14
Cited 0

A training algorithm for optimal margin classifiers

COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Support-Vector Networks

Machine Learning
X-means: Extending K-means with Efficient Estimation of the Number of Clusters

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Sparse bayesian learning and the relevance vector machine

The Journal of Machine Learning Research
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
Prediction of phosphorylation sites using SVMs

Bioinformatics
How Many Clusters? An Information-Theoretic Perspective

Neural Computation
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
NetPhosYeast

Bioinformatics
Analysis of protein phosphorylation site predictors with an independent dataset

International Journal of Bioinformatics Research and Applications
Sparse Kernel Learning and the Relevance Units Machine

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Computational prediction of eukaryotic phosphorylation sites

Bioinformatics
Rademacher penalties and structural risk minimization

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

Phosphorylation is an important post-translational modification of proteins that is essential to the regulation of many cellular processes. Although most of the phosphorylation sites discovered in protein sequences have been identified experimentally, the in vivo and in vitro discovery of the sites is an expensive, time-consuming and laborious task. Therefore, the development of computational methods for prediction of protein phosphorylation sites has drawn considerable attention. In this work, we present a kernel-based probabilistic Classification Relevance Units Machine (CRUM) for in silico phosphorylation site prediction. In comparison with the popular Support Vector Machine (SVM) CRUM shows comparable predictive performance and yet provides a more parsimonious model. This is desirable since it leads to a reduction in prediction run-time, which is important in predictions on large-scale data. Furthermore, the CRUM training algorithm has lower run-time and memory complexity and has a simpler parameter selection scheme than the Relevance Vector Machine (RVM) learning algorithm. To further investigate the viability of using CRUM in phosphorylation site prediction, we construct multiple CRUM predictors using different combinations of three phosphorylation site features -- BLOSUM encoding, disorder, and amino acid composition. The predictors are evaluated through cross-validation and the results show that CRUM with BLOSUM feature is among the best performing CRUM predictors in both cross-validation and benchmark experiments. A comparative study with existing prediction tools in an independent benchmark experiment suggests possible direction for further improving the predictive performance of CRUM predictors.