Weave amino acid sequences for protein secondary structure prediction

  • Authors:
  • Xiaochun Yang;Bin Wang

  • Affiliations:
  • Brigham Young University, Provo, Utah;Northeastern University, Shenyang, China.P.R.

  • Venue:
  • DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Given a known protein sequence, predicting its secondary structure can help understand its three-dimensional (tertiary) structure, i.e., the folding. In this paper, we present an approach for predicting protein secondary structures. Different from the existing prediction methods, our approach proposes an encoding schema that weaves physio-chemical information in encoded vectors and a prediction framework that combines the context information with secondary structure segments. We employed Support Vector Machine (SVM) for training the CB513 and RS126 data sets, which are collections of protein secondary structure sequences, through sevenfold cross validation to uncover the structural differences of protein secondary structures. Hereafter, we apply the sliding window technique to test a set of protein sequences based on the group classification learned from the training set. Our approach achieves 77.8% segment overlap accuracy (SOV) and 75.2% three-state overall per-residue accuracy (Q3), which outperform other prediction methods.