Automated alphabet reduction method with evolutionary algorithms for protein structure prediction

Authors:
Jaume Bacardit;Michael Stout;Jonathan D. Hirst;Kumara Sastry;Xavier Llorà;Natalio Krasnogor
Affiliations:
University of Nottingham, Nottingham, United Kingdom;University of Nottingham, Nottingham, United Kingdom;University of Nottingham, Nottingham, United Kingdom;University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL;University of Nottingham, Nottingham, United Kingdom
Venue:
Proceedings of the 9th annual conference on Genetic and evolutionary computation
Year:
2007

Citing 15
Cited 17

Elements of information theory

Elements of information theory
C4.5: programs for machine learning

C4.5: programs for machine learning
Bagging predictors

Machine Learning
Separate-and-Conquer Rule Learning

Artificial Intelligence Review
Data mining: practical machine learning tools and techniques with Java implementations

Data mining: practical machine learning tools and techniques with Java implementations
Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation

Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation
SIA: A Supervised Inductive Algorithm with Genetic Search for Learning Attributes based Concepts

ECML '93 Proceedings of the European Conference on Machine Learning
Uniform Crossover in Genetic Algorithms

Proceedings of the 3rd International Conference on Genetic Algorithms
Multimeme Algorithms for Protein Structure Prediction

PPSN VII Proceedings of the 7th International Conference on Parallel Problem Solving from Nature
Prediction of Contact Maps Using Support Vector Machines

BIBE '03 Proceedings of the 3rd IEEE Symposium on BioInformatics and BioEngineering
The principled design of large-scale recursive neural network architectures--dag-rnns and the protein structure prediction problem

The Journal of Machine Learning Research
Striped sheets and protein contact prediction

Bioinformatics
Coordination number prediction using learning classifier systems: performance and interpretability

Proceedings of the 8th annual conference on Genetic and evolutionary computation
Estimating continuous distributions in Bayesian classifiers

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
From HP lattice models to real proteins: coordination number prediction using learning classifier systems

EuroGP'06 Proceedings of the 2006 international conference on Applications of Evolutionary Computing

Learning classifier systems: a complete introduction, review, and roadmap

Journal of Artificial Evolution and Applications
Fast rule representation for continuous attributes in genetics-based machine learning

Proceedings of the 10th annual conference on Genetic and evolutionary computation
Empirical Evaluation of Ensemble Techniques for a Pittsburgh Learning Classifier System

Learning Classifier Systems
A tale of human-competitiveness in bioinformatics

ACM SIGEVOlution
MILCS in protein structure prediction with default hierarchies

Proceedings of the first ACM/SIGEVO Summit on Genetic and Evolutionary Computation
A mixed discrete-continuous attribute list representation for large scale classification domains

Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Improving Markov chain classification using string transformations and evolutionary search

Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Large scale data mining using genetics-based machine learning

Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers
Learning classifier systems: a complete introduction, review, and roadmap

Journal of Artificial Evolution and Applications
A method for improving protein localization prediction from datasets with outliers

CIBCB'09 Proceedings of the 6th Annual IEEE conference on Computational Intelligence in Bioinformatics and Computational Biology
The application of michigan-style learning classifiersystems to address genetic heterogeneity and epistasisin association studies

Proceedings of the 12th annual conference on Genetic and evolutionary computation
Large scale data mining using genetics-based machine learning

Proceedings of the 13th annual conference companion on Genetic and evolutionary computation
Mining data streams with concept drifts using genetic algorithm

Artificial Intelligence Review
Large scale data mining using genetics-based machine learning

Proceedings of the 14th annual conference companion on Genetic and evolutionary computation
GAMoN: Discovering M-of-N{¬,∨} hypotheses for text classification by a lattice-based Genetic Algorithm

Artificial Intelligence
Using expert knowledge to guide covering and mutation in a michigan style learning classifier system to detect epistasis and heterogeneity

PPSN'12 Proceedings of the 12th international conference on Parallel Problem Solving from Nature - Volume Part I
Large scale data mining using genetics-based machine learning

Proceedings of the 15th annual conference companion on Genetic and evolutionary computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper focuses on automated procedures to reduce the dimensionality ofprotein structure prediction datasets by simplifying the way in which the primary sequence of a protein is represented. The potential benefits ofthis procedure are faster and easier learning process as well as the generationof more compact and human-readable classifiers.The dimensionality reduction procedure we propose consists on the reductionof the 20-letter amino acid (AA) alphabet, which is normally used to specify a protein sequence, into a lower cardinality alphabet. This reduction comes about by a clustering of AA types accordingly to their physical and chemical similarity. Our automated reduction procedure is guided by a fitness function based on the Mutual Information between the AA-based input attributes of the dataset and the protein structure featurethat being predicted. To search for the optimal reduction, the Extended Compact Genetic Algorithm (ECGA) was used, and afterwards the results of this process were fed into (and validated by) BioHEL, a genetics-based machine learningtechnique. BioHEL used the reduced alphabet to induce rules forprotein structure prediction features. BioHEL results are compared to two standard machine learning systems. Our results show that it is possible to reduce the size of the alphabet used for prediction fromtwenty to just three letters resulting in more compact, i.e. interpretable,rules. Also, a protein-wise accuracy performance measure suggests that the loss of accuracy acrued by this substantial alphabet reduction is not statistically significant when compared to the full alphabet.