Ensembled support vector machines for human papillomavirus risk type prediction from protein secondary structures

  • Authors:
  • Sun Kim;Jeongmi Kim;Byoung-Tak Zhang

  • Affiliations:
  • School of Computer Science and Engineering, Seoul National University, Seoul 151-744, Republic of Korea;ISU ABXIS CO., LTD, Seoul 120-752, Republic of Korea;School of Computer Science and Engineering, Seoul National University, Seoul 151-744, Republic of Korea

  • Venue:
  • Computers in Biology and Medicine
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Infection by the human papillomavirus (HPV) is regarded as the major risk factor in the development of cervical cancer. Detection of high-risk HPV is important for understanding its oncogenic mechanisms and for developing novel clinical tools for its diagnosis, treatment, and prevention. Several methods are available to predict the risk types for HPV protein sequences. Nevertheless, no tools can achieve a universally good performance for all domains, including HPV and nor do they provide confidence levels for their decisions. Here, we describe ensembled support vector machines (SVMs) to classify HPV risk types, which assign given proteins into high-, possibly high-, or low-risk type based on their confidence level. Our approach uses protein secondary structures to obtain the differential contribution of subsequences for the risk type, and SVM classifiers are combined with a simple but efficient string kernel to handle HPV protein sequences. In the experiments, we compare our approach with previous methods in accuracy and F1-score, and present the predictions for unknown HPV types, which provides promising results.