Prediction of Protein Secondary Structure with two-stage multi-class SVMs

  • Authors:
  • Minh N. Nguyen;Jagath C. Rajapakse

  • Affiliations:
  • BioInformatics Research Centre, School of Computer Engineering, Nanyang Technological University, Singapore.;Biological Engineering Division, Massachusetts Institute of Technology, USA/ Singapore-/MIT Alliance, N2-/B2C-/15, 50 Nanyang Avenue, Singapore 639798

  • Venue:
  • International Journal of Data Mining and Bioinformatics
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Bioinformatics techniques to Protein Secondary Structure (PSS)prediction mostly depend on the information available in amino acidsequences. In this paper, we propose a two-stage Multi-classSupport Vector Machine (MSVM) approach, where the second MSVMpredictor is introduced at the output of the first stage MSVM tocapture the contextual relationship among secondary structureelements in order to minimise the generalisation error in theprediction. By using position-specific scoring matrices generatedby PSI-BLAST, the two-stage MSVM approach achieves Q3accuracies of 78.0% and 76.3% on the RS126 dataset of 126non-homologous globular proteins and the CB396 dataset of 396non-homologous proteins, respectively, which are better than thescores reported on both datasets to date. By using MSVM, thepresent prediction scheme significantly achieves 2 6% and 3 15% ofimprovement in Q3 and Sov accuracies, respectively, onthe two datasets. On larger blind-test datasets from PSIPRED, CASP4and EVA datasets, two-stage MSVM approach achieves Q3accuracies from 77.0% to 79.5%.