C4.5: programs for machine learning
C4.5: programs for machine learning
Using Genetic Algorithms for Concept Learning
Machine Learning - Special issue on genetic algorithms
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Multimeme Algorithms for Protein Structure Prediction
PPSN VII Proceedings of the 7th International Conference on Parallel Problem Solving from Nature
Prediction of Contact Maps Using Support Vector Machines
BIBE '03 Proceedings of the 3rd IEEE Symposium on BioInformatics and BioEngineering
The Journal of Machine Learning Research
Striped sheets and protein contact prediction
Bioinformatics
Classifier fitness based on accuracy
Evolutionary Computation
Estimating continuous distributions in Bayesian classifiers
UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Evolving l-systems to capture protein structure native conformations
EuroGP'05 Proceedings of the 8th European conference on Genetic Programming
MILCS: a mutual information learning classifier system
Proceedings of the 9th annual conference companion on Genetic and evolutionary computation
Automated alphabet reduction method with evolutionary algorithms for protein structure prediction
Proceedings of the 9th annual conference on Genetic and evolutionary computation
A Learning Classifier System with Mutual-Information-Based Fitness
Learning Classifier Systems
Empirical Evaluation of Ensemble Techniques for a Pittsburgh Learning Classifier System
Learning Classifier Systems
A tale of human-competitiveness in bioinformatics
ACM SIGEVOlution
Hi-index | 0.00 |
Prediction of the coordination number (CN) of residues in proteins based solely on protein sequence has recently received renewed attention. At the same time, simplified protein models such as the HP model have been used to understand protein folding and protein structure prediction. These models represent the sequence of a protein using two residue types: hydrophobic and polar, and restrict the residue locations to those of a lattice. The aim of this paper is to compare CN prediction at three levels of abstraction a) 3D Cubic lattice HP model proteins, b) Real proteins represented by their HP sequence and c) Real proteins using residue sequence alone. For the 3D HP lattice model proteins the CN of each residue is simply the number of neighboring residues on the lattice. For the real proteins, we use a recent real-valued definition of CN proposed by Kinjo et al. To perform the predictions we use GAssist, a recent evolutionary computation based machine learning method belonging to the Learning Classifier System (LCS) family. Its performance was compared against some alternative learning techniques. Predictions using the HP sequence representation with only two residue types were only a little worse than those using a full 20 letter amino acid alphabet (64% vs 68% for two state prediction, 45% vs 50% for three state prediction and 30% vs 33% for five state prediction). That HP sequence information alone can result in predictions accuracies that are within 5% of those obtained using full residue type information indicates that hydrophobicity is a key determinant of CN and further justifies studies of simplified models.