A compact hybrid feature vector for an accurate secondary structure prediction

  • Authors:
  • Rohayanti Hassan;Razib M. Othman;Puteh Saad;Shahreen Kasim

  • Affiliations:
  • Laboratory of Computational Intelligence and Biotechnology, Infocomm Research Alliance, Universiti Teknologi Malaysia, 81310 UTM Skudai, Malaysia;Laboratory of Computational Intelligence and Biotechnology, Infocomm Research Alliance, Universiti Teknologi Malaysia, 81310 UTM Skudai, Malaysia;Department of Software Engineering, Faculty of Computer Science and Information Systems, Universiti Teknologi Malaysia, 81310 UTM Skudai, Malaysia;Department of Information System, Faculty of Information Technology and Multimedia, Universiti Tun Hussein Onn Malaysia, 86400 Batu Pahat, Malaysia

  • Venue:
  • Information Sciences: an International Journal
  • Year:
  • 2011

Quantified Score

Hi-index 0.07

Visualization

Abstract

Amino acid propensity score is one of the earliest successful methods used in protein secondary structure prediction. However, the score performs poorly on small-sized datasets and low-identity protein sequences. Based on current in silico method, secondary structure can be predicted from local folds or local protein structure. In biology, the evolution of secondary structure produces local protein structure with different lengths. To precisely predict secondary structures, we propose a derivative feature vector, DPS that utilizes the optimal length of the local protein structure. DPS is the unification of amino acid propensity score and dihedral angle score. This new feature vector is further normalized to level the edges. Prediction is performed by support vector machines (SVM) over the DPS feature vectors with class labels generated by secondary structure assignment method (SSAM) and secondary structure prediction method (SSPM). All experiments are carried out on RS126 sequences. The results from this proposed method also highlight the overall accuracy of our method compared to other state-of-the-art methods. The performance of our method was acceptable specifically in dealing with low number and low identity sequences.