Predicting protein secondary structure using a mixed-modal SVM method in a compound pyramid model

  • Authors:
  • Bingru Yang;Qu Wu;Zhou Ying;Haifeng Sui

  • Affiliations:
  • School of Information Engineering, University of Science and Technology Beijing, Beijing, China;School of Information Engineering, University of Science and Technology Beijing, Beijing, China;School of Information Engineering, University of Science and Technology Beijing, Beijing, China;School of Information Engineering, University of Science and Technology Beijing, Beijing, China

  • Venue:
  • Knowledge-Based Systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Accurate protein secondary structure prediction plays an important role in direct tertiary structure modeling, and can also significantly improve sequence analysis and sequence-structure threading for structure and function determination. Hence improving the accuracy of secondary structure prediction is essential for future developments throughout the field of protein research. In this article, we propose a mixed-modal support vector machine (SVM) method for predicting protein secondary structure. Using the evolutionary information contained in the physicochemical properties of each amino acid and a position-specific scoring matrix generated by a PSI-BLAST multiple sequence alignment as input for a mixed-modal SVM, secondary structure can be predicted at significantly increased accuracy. Using a Knowledge Discovery Theory based on the Inner Cognitive Mechanism (KDTICM) method, we have proposed a compound pyramid model, which is composed of three layers of intelligent interface that integrate a mixed-modal SVM (MMS) module, a modified Knowledge Discovery in Databases (KDD*) process, a mixed-modal back propagation neural network (MMBP) module and so on. Testing against data sets of non-redundant protein sequences returned values for the Q"3 accuracy measure that ranged from 84.0% to 85.6%,while values for the SOV99 segment overlap measure ranged from 79.8% to 80.6%. When compared using a blind test dataset from the CASP8 meeting against currently available secondary structure prediction methods, our new approach shows superior accuracy. Availability: http://www.kdd.ustb.edu.cn/protein_Web/.