Elements of information theory
Elements of information theory
C4.5: programs for machine learning
C4.5: programs for machine learning
Machine Learning
Separate-and-Conquer Rule Learning
Artificial Intelligence Review
Data mining: practical machine learning tools and techniques with Java implementations
Data mining: practical machine learning tools and techniques with Java implementations
Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation
Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation
SIA: A Supervised Inductive Algorithm with Genetic Search for Learning Attributes based Concepts
ECML '93 Proceedings of the European Conference on Machine Learning
Uniform Crossover in Genetic Algorithms
Proceedings of the 3rd International Conference on Genetic Algorithms
Multimeme Algorithms for Protein Structure Prediction
PPSN VII Proceedings of the 7th International Conference on Parallel Problem Solving from Nature
Prediction of Contact Maps Using Support Vector Machines
BIBE '03 Proceedings of the 3rd IEEE Symposium on BioInformatics and BioEngineering
The Journal of Machine Learning Research
Striped sheets and protein contact prediction
Bioinformatics
Coordination number prediction using learning classifier systems: performance and interpretability
Proceedings of the 8th annual conference on Genetic and evolutionary computation
Estimating continuous distributions in Bayesian classifiers
UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
EuroGP'06 Proceedings of the 2006 international conference on Applications of Evolutionary Computing
Learning classifier systems: a complete introduction, review, and roadmap
Journal of Artificial Evolution and Applications
Fast rule representation for continuous attributes in genetics-based machine learning
Proceedings of the 10th annual conference on Genetic and evolutionary computation
Empirical Evaluation of Ensemble Techniques for a Pittsburgh Learning Classifier System
Learning Classifier Systems
A tale of human-competitiveness in bioinformatics
ACM SIGEVOlution
MILCS in protein structure prediction with default hierarchies
Proceedings of the first ACM/SIGEVO Summit on Genetic and Evolutionary Computation
A mixed discrete-continuous attribute list representation for large scale classification domains
Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Improving Markov chain classification using string transformations and evolutionary search
Proceedings of the 11th Annual conference on Genetic and evolutionary computation
Large scale data mining using genetics-based machine learning
Proceedings of the 11th Annual Conference Companion on Genetic and Evolutionary Computation Conference: Late Breaking Papers
Learning classifier systems: a complete introduction, review, and roadmap
Journal of Artificial Evolution and Applications
A method for improving protein localization prediction from datasets with outliers
CIBCB'09 Proceedings of the 6th Annual IEEE conference on Computational Intelligence in Bioinformatics and Computational Biology
Proceedings of the 12th annual conference on Genetic and evolutionary computation
Large scale data mining using genetics-based machine learning
Proceedings of the 13th annual conference companion on Genetic and evolutionary computation
Mining data streams with concept drifts using genetic algorithm
Artificial Intelligence Review
Large scale data mining using genetics-based machine learning
Proceedings of the 14th annual conference companion on Genetic and evolutionary computation
PPSN'12 Proceedings of the 12th international conference on Parallel Problem Solving from Nature - Volume Part I
Large scale data mining using genetics-based machine learning
Proceedings of the 15th annual conference companion on Genetic and evolutionary computation
Hi-index | 0.00 |
This paper focuses on automated procedures to reduce the dimensionality ofprotein structure prediction datasets by simplifying the way in which the primary sequence of a protein is represented. The potential benefits ofthis procedure are faster and easier learning process as well as the generationof more compact and human-readable classifiers.The dimensionality reduction procedure we propose consists on the reductionof the 20-letter amino acid (AA) alphabet, which is normally used to specify a protein sequence, into a lower cardinality alphabet. This reduction comes about by a clustering of AA types accordingly to their physical and chemical similarity. Our automated reduction procedure is guided by a fitness function based on the Mutual Information between the AA-based input attributes of the dataset and the protein structure featurethat being predicted. To search for the optimal reduction, the Extended Compact Genetic Algorithm (ECGA) was used, and afterwards the results of this process were fed into (and validated by) BioHEL, a genetics-based machine learningtechnique. BioHEL used the reduced alphabet to induce rules forprotein structure prediction features. BioHEL results are compared to two standard machine learning systems. Our results show that it is possible to reduce the size of the alphabet used for prediction fromtwenty to just three letters resulting in more compact, i.e. interpretable,rules. Also, a protein-wise accuracy performance measure suggests that the loss of accuracy acrued by this substantial alphabet reduction is not statistically significant when compared to the full alphabet.