Ensembled support vector machines for human papillomavirus risk type prediction from protein secondary structures

Authors:
Sun Kim;Jeongmi Kim;Byoung-Tak Zhang
Affiliations:
School of Computer Science and Engineering, Seoul National University, Seoul 151-744, Republic of Korea;ISU ABXIS CO., LTD, Seoul 120-752, Republic of Korea;School of Computer Science and Engineering, Seoul National University, Seoul 151-744, Republic of Korea
Venue:
Computers in Biology and Medicine
Year:
2009

Citing 7
Cited 1

Neural networks and the bias/variance dilemma

Neural Computation
Text filtering by boosting naive Bayes classifiers

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
AdaCost: Misclassification Cost-Sensitive Boosting

ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
Mismatch string kernels for discriminative protein classification

Bioinformatics
Protein sequence-based risk classification for human papillomaviruses

Computers in Biology and Medicine
Human papillomavirus risk type classification from protein sequences using support vector machines

EuroGP'06 Proceedings of the 2006 international conference on Applications of Evolutionary Computing

Improving protein secondary structure prediction using a multi-modal BP method

Computers in Biology and Medicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Infection by the human papillomavirus (HPV) is regarded as the major risk factor in the development of cervical cancer. Detection of high-risk HPV is important for understanding its oncogenic mechanisms and for developing novel clinical tools for its diagnosis, treatment, and prevention. Several methods are available to predict the risk types for HPV protein sequences. Nevertheless, no tools can achieve a universally good performance for all domains, including HPV and nor do they provide confidence levels for their decisions. Here, we describe ensembled support vector machines (SVMs) to classify HPV risk types, which assign given proteins into high-, possibly high-, or low-risk type based on their confidence level. Our approach uses protein secondary structures to obtain the differential contribution of subsequences for the risk type, and SVM classifiers are combined with a simple but efficient string kernel to handle HPV protein sequences. In the experiments, we compare our approach with previous methods in accuracy and F1-score, and present the predictions for unknown HPV types, which provides promising results.