From HP lattice models to real proteins: coordination number prediction using learning classifier systems

  • Authors:
  • Michael Stout;Jaume Bacardit;Jonathan D. Hirst;Natalio Krasnogor;Jacek Blazewicz

  • Affiliations:
  • Automated Scheduling, Optimization and Planning research group, School of Computer Science and IT, University of Nottingham, Nottingham, UK;Automated Scheduling, Optimization and Planning research group, School of Computer Science and IT, University of Nottingham, Nottingham, UK;School of Chemistry, University of Nottingham, Nottingham, UK;Automated Scheduling, Optimization and Planning research group, School of Computer Science and IT, University of Nottingham, Nottingham, UK;Institute of Computing Science, Poznan University of Technology, Poznan, Poland

  • Venue:
  • EuroGP'06 Proceedings of the 2006 international conference on Applications of Evolutionary Computing
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Prediction of the coordination number (CN) of residues in proteins based solely on protein sequence has recently received renewed attention. At the same time, simplified protein models such as the HP model have been used to understand protein folding and protein structure prediction. These models represent the sequence of a protein using two residue types: hydrophobic and polar, and restrict the residue locations to those of a lattice. The aim of this paper is to compare CN prediction at three levels of abstraction a) 3D Cubic lattice HP model proteins, b) Real proteins represented by their HP sequence and c) Real proteins using residue sequence alone. For the 3D HP lattice model proteins the CN of each residue is simply the number of neighboring residues on the lattice. For the real proteins, we use a recent real-valued definition of CN proposed by Kinjo et al. To perform the predictions we use GAssist, a recent evolutionary computation based machine learning method belonging to the Learning Classifier System (LCS) family. Its performance was compared against some alternative learning techniques. Predictions using the HP sequence representation with only two residue types were only a little worse than those using a full 20 letter amino acid alphabet (64% vs 68% for two state prediction, 45% vs 50% for three state prediction and 30% vs 33% for five state prediction). That HP sequence information alone can result in predictions accuracies that are within 5% of those obtained using full residue type information indicates that hydrophobicity is a key determinant of CN and further justifies studies of simplified models.