Machine Learning
Open BEAGLE: A New C++ Evolutionary Computation Framework
GECCO '02 Proceedings of the Genetic and Evolutionary Computation Conference
Genetic Programming, Ensemble Methods and the Bias/Variance Tradeoff - Introductory Investigations
Proceedings of the European Conference on Genetic Programming
Genetic programming: a paradigm for genetically breeding populations of computer programs to solve problems
A comparison of bloat control methods for genetic programming
Evolutionary Computation
Genetic Programming and Evolvable Machines
On Improving Generalisation in Genetic Programming
EuroGP '09 Proceedings of the 12th European Conference on Genetic Programming
Genetic programming, the reflection of chaos, and the bootstrap: towards a useful test for chaos
GECCO '96 Proceedings of the 1st annual conference on Genetic and evolutionary computation
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
Bloat control operators and diversity in genetic programming: A comparative study
Evolutionary Computation
GP classification under imbalanced data sets: active sub-sampling and AUC approximation
EuroGP'08 Proceedings of the 11th European conference on Genetic programming
A Field Guide to Genetic Programming
A Field Guide to Genetic Programming
Measuring bloat, overfitting and functional complexity in genetic programming
Proceedings of the 12th annual conference on Genetic and evolutionary computation
Abstract functions and lifetime learning in genetic programming for symbolic regression
Proceedings of the 12th annual conference on Genetic and evolutionary computation
Promoting phenotypic diversity in genetic programming
PPSN'10 Proceedings of the 11th international conference on Parallel problem solving from nature: Part II
Variance based selection to improve test set performance in genetic programming
Proceedings of the 13th annual conference on Genetic and evolutionary computation
A quantitative study of learning and generalization in genetic programming
EuroGP'11 Proceedings of the 14th European conference on Genetic programming
Validation sets for evolutionary curtailment with improved generalisation
ICHIT'11 Proceedings of the 5th international conference on Convergence and hybrid information technology
Genetic programming, validation sets, and parsimony pressure
EuroGP'06 Proceedings of the 9th European conference on Genetic Programming
Improving the generalisation ability of genetic programming with semantic similarity based crossover
EuroGP'10 Proceedings of the 13th European conference on Genetic Programming
Genetic programming and evolutionary generalization
IEEE Transactions on Evolutionary Computation
Training genetic programming on half a million patterns: an example from anomaly detection
IEEE Transactions on Evolutionary Computation
Random sampling technique for overfitting control in genetic programming
EuroGP'12 Proceedings of the 15th European conference on Genetic Programming
Exploring boundaries: optimising individual class boundaries for binary classification problem
Proceedings of the 14th annual conference on Genetic and evolutionary computation
Hi-index | 0.00 |
Historically, the quality of a solution in Genetic Programming (GP) was often assessed based on its performance on a given training sample. However, in Machine Learning, we are more interested in achieving reliable estimates of the quality of the evolving individuals on unseen data. In this paper, we propose to simulate the effect of unseen data during training without actually using any additional data. We do this by employing a technique called bootstrapping that repeatedly re-samples with replacement from the training data and helps estimate sensitivity of the individual in question to small variations across these re-sampled data sets. We minimise this sensitivity, as measured by the Bootstrap Standard Error, together with the training error, in an effort to evolve models that generalise better to the unseen data. We evaluate the proposed technique on four binary classification problems and compare with a standard GP approach. The results show that for the problems undertaken, the proposed method not only generalises significantly better than standard GP while the training performance improves, but also demonstrates a strong side effect of containing the tree sizes.