Protein sequence-based risk classification for human papillomaviruses

  • Authors:
  • Je-Gun Joung;Sok June O;Byoung-Tak Zhang

  • Affiliations:
  • Graduate Program in Bioinformatics, Seoul National University, Seoul 151-742, Republic of Korea and Center for Bioinformation Technology (CBIT), Seoul National University, Seoul 151-742, Republic ...;Department of Pharmacology and Pharmacogenomics Research Center, Inje University College of Medicine, Busan 614-735, Republic of Korea;Graduate Program in Bioinformatics, Seoul National University, Seoul 151-742, Republic of Korea and Center for Bioinformation Technology (CBIT), Seoul National University, Seoul 151-742, Republic ...

  • Venue:
  • Computers in Biology and Medicine
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Human papillomaviruses (HPVs) are small DNA tumor viruses which infect epithelial tissues and induce hyperproliferative lesions. Infection by high-risk genital HPVs is associated with the development of anogenital cancers. Classification of risk types is important in understanding the mechanisms in infection and in developing novel instruments for medical examination such as DNA microarrays. The sequence-based classification methods are useful in classifying risk types by considering residues in conserved positions. In this paper, we present a machine learning approach to the classification of HPV risk types by using the protein sequences. Our approach is based on the hidden Markov model and the kernel method. The former searches informative subsequence positions and the latter computes efficiently to classify protein sequences. In the experiments, the classifier predicted four unknown HPV types exactly. An additional result shows that the kernel-based classifiers learned with more informative subsequences outperform the classifiers learned with the whole sequence or random subsequences.