A training algorithm for optimal margin classifiers
COLT '92 Proceedings of the fifth annual workshop on Computational learning theory
Machine Learning
X-means: Extending K-means with Efficient Estimation of the Number of Clusters
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Sparse bayesian learning and the relevance vector machine
The Journal of Machine Learning Research
Kernel Methods for Pattern Analysis
Kernel Methods for Pattern Analysis
Prediction of phosphorylation sites using SVMs
Bioinformatics
How Many Clusters? An Information-Theoretic Perspective
Neural Computation
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
Bioinformatics
Analysis of protein phosphorylation site predictors with an independent dataset
International Journal of Bioinformatics Research and Applications
Sparse Kernel Learning and the Relevance Units Machine
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Computational prediction of eukaryotic phosphorylation sites
Bioinformatics
Rademacher penalties and structural risk minimization
IEEE Transactions on Information Theory
Hi-index | 0.00 |
Phosphorylation is an important post-translational modification of proteins that is essential to the regulation of many cellular processes. Although most of the phosphorylation sites discovered in protein sequences have been identified experimentally, the in vivo and in vitro discovery of the sites is an expensive, time-consuming and laborious task. Therefore, the development of computational methods for prediction of protein phosphorylation sites has drawn considerable attention. In this work, we present a kernel-based probabilistic Classification Relevance Units Machine (CRUM) for in silico phosphorylation site prediction. In comparison with the popular Support Vector Machine (SVM) CRUM shows comparable predictive performance and yet provides a more parsimonious model. This is desirable since it leads to a reduction in prediction run-time, which is important in predictions on large-scale data. Furthermore, the CRUM training algorithm has lower run-time and memory complexity and has a simpler parameter selection scheme than the Relevance Vector Machine (RVM) learning algorithm. To further investigate the viability of using CRUM in phosphorylation site prediction, we construct multiple CRUM predictors using different combinations of three phosphorylation site features -- BLOSUM encoding, disorder, and amino acid composition. The predictors are evaluated through cross-validation and the results show that CRUM with BLOSUM feature is among the best performing CRUM predictors in both cross-validation and benchmark experiments. A comparative study with existing prediction tools in an independent benchmark experiment suggests possible direction for further improving the predictive performance of CRUM predictors.