From HP lattice models to real proteins: coordination number prediction using learning classifier systems

Authors:
Michael Stout;Jaume Bacardit;Jonathan D. Hirst;Natalio Krasnogor;Jacek Blazewicz
Affiliations:
Automated Scheduling, Optimization and Planning research group, School of Computer Science and IT, University of Nottingham, Nottingham, UK;Automated Scheduling, Optimization and Planning research group, School of Computer Science and IT, University of Nottingham, Nottingham, UK;School of Chemistry, University of Nottingham, Nottingham, UK;Automated Scheduling, Optimization and Planning research group, School of Computer Science and IT, University of Nottingham, Nottingham, UK;Institute of Computing Science, Poznan University of Technology, Poznan, Poland
Venue:
EuroGP'06 Proceedings of the 2006 international conference on Applications of Evolutionary Computing
Year:
2006

Citing 10
Cited 5

C4.5: programs for machine learning

C4.5: programs for machine learning
Using Genetic Algorithms for Concept Learning

Machine Learning - Special issue on genetic algorithms
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Multimeme Algorithms for Protein Structure Prediction

PPSN VII Proceedings of the 7th International Conference on Parallel Problem Solving from Nature
Prediction of Contact Maps Using Support Vector Machines

BIBE '03 Proceedings of the 3rd IEEE Symposium on BioInformatics and BioEngineering
The principled design of large-scale recursive neural network architectures--dag-rnns and the protein structure prediction problem

The Journal of Machine Learning Research
Striped sheets and protein contact prediction

Bioinformatics
Classifier fitness based on accuracy

Evolutionary Computation
Estimating continuous distributions in Bayesian classifiers

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Evolving l-systems to capture protein structure native conformations

EuroGP'05 Proceedings of the 8th European conference on Genetic Programming

MILCS: a mutual information learning classifier system

Proceedings of the 9th annual conference companion on Genetic and evolutionary computation
Automated alphabet reduction method with evolutionary algorithms for protein structure prediction

Proceedings of the 9th annual conference on Genetic and evolutionary computation
A Learning Classifier System with Mutual-Information-Based Fitness

Learning Classifier Systems
Empirical Evaluation of Ensemble Techniques for a Pittsburgh Learning Classifier System

Learning Classifier Systems
A tale of human-competitiveness in bioinformatics

ACM SIGEVOlution

Quantified Score

Hi-index	0.00

Visualization

Abstract

Prediction of the coordination number (CN) of residues in proteins based solely on protein sequence has recently received renewed attention. At the same time, simplified protein models such as the HP model have been used to understand protein folding and protein structure prediction. These models represent the sequence of a protein using two residue types: hydrophobic and polar, and restrict the residue locations to those of a lattice. The aim of this paper is to compare CN prediction at three levels of abstraction a) 3D Cubic lattice HP model proteins, b) Real proteins represented by their HP sequence and c) Real proteins using residue sequence alone. For the 3D HP lattice model proteins the CN of each residue is simply the number of neighboring residues on the lattice. For the real proteins, we use a recent real-valued definition of CN proposed by Kinjo et al. To perform the predictions we use GAssist, a recent evolutionary computation based machine learning method belonging to the Learning Classifier System (LCS) family. Its performance was compared against some alternative learning techniques. Predictions using the HP sequence representation with only two residue types were only a little worse than those using a full 20 letter amino acid alphabet (64% vs 68% for two state prediction, 45% vs 50% for three state prediction and 30% vs 33% for five state prediction). That HP sequence information alone can result in predictions accuracies that are within 5% of those obtained using full residue type information indicates that hydrophobicity is a key determinant of CN and further justifies studies of simplified models.