Order of nonlinearity as a complexity measure for models generated by symbolic regression via Pareto genetic programming

Authors:
Ekaterina J. Vladislavleva;Guido F. Smits;Dick Den Hertog
Affiliations:
Department of Econometrics and Operations Research, Tilburg University, Tilburg, The Netherlands;Core R&D Department, Dow Benelux B.V., Terneuzen, The Netherlands;Department of Econometrics and Operations Research, Tilburg University, Tilburg, The Netherlands
Venue:
IEEE Transactions on Evolutionary Computation
Year:
2009

Citing 33
Cited 25

Radial basis functions for multivariable interpolation: a review

Algorithms for approximation
Applied multivariate statistical analysis

Applied multivariate statistical analysis
Genetic programming: on the programming of computers by means of natural selection

Genetic programming: on the programming of computers by means of natural selection
Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
Genetic programming II: automatic discovery of reusable programs

Genetic programming II: automatic discovery of reusable programs
Genetic programming using a minimum description length principle

Advances in genetic programming
Genetic programming: an introduction: on the automatic evolution of computer programs and its applications

Genetic programming: an introduction: on the automatic evolution of computer programs and its applications
The evolution of size and shape

Advances in genetic programming
Foundations of genetic programming

Foundations of genetic programming
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
Learning from Data: Concepts, Theory, and Methods

Learning from Data: Concepts, Theory, and Methods
Model complexity control and statisticallearning theory

Natural Computing: an international journal
Some Considerations on the Reason for Bloat

Genetic Programming and Evolvable Machines
An Analysis of the Causes of Code Growth in Genetic Programming

Genetic Programming and Evolvable Machines
Accurate Replication in Genetic Programming

Proceedings of the 6th International Conference on Genetic Algorithms
Complexity Compression and Evolution

Proceedings of the 6th International Conference on Genetic Algorithms
Evolving Compact Solutions in Genetic Programming: A Case Study

PPSN IV Proceedings of the 4th International Conference on Parallel Problem Solving from Nature
The Support Vector Method

ICANN '97 Proceedings of the 7th International Conference on Artificial Neural Networks
Archiving With Guaranteed Convergence And Diversity In Multi-objective Optimization

GECCO '02 Proceedings of the Genetic and Evolutionary Computation Conference
Fitness Causes Bloat: Mutation

EuroGP '98 Proceedings of the First European Workshop on Genetic Programming
Problem Difficulty and Code Growth in Genetic Programming

Genetic Programming and Evolvable Machines
Muiltiobjective optimization using nondominated sorting in genetic algorithms

Evolutionary Computation
An overview of evolutionary algorithms in multiobjective optimization

Evolutionary Computation
Balancing accuracy and parsimony in genetic programming

Evolutionary Computation
Probabilistic incremental program evolution

Evolutionary Computation
Effects of code growth and parsimony pressure on populations in genetic programming

Evolutionary Computation
Code growth in genetic programming

GECCO '96 Proceedings of the 1st annual conference on Genetic and evolutionary computation
Improving symbolic regression with interval arithmetic and linear scaling

EuroGP'03 Proceedings of the 6th European conference on Genetic programming
The root causes of code growth in genetic programming

EuroGP'03 Proceedings of the 6th European conference on Genetic programming
Crossover bias in genetic programming

EuroGP'07 Proceedings of the 10th European conference on Genetic programming
On the limiting distribution of program sizes in tree-based genetic programming

EuroGP'07 Proceedings of the 10th European conference on Genetic programming
A fast and elitist multiobjective genetic algorithm: NSGA-II

IEEE Transactions on Evolutionary Computation
Performance assessment of multiobjective optimizers: an analysis and review

IEEE Transactions on Evolutionary Computation

Evolutionary Model Type Selection for Global Surrogate Modeling

The Journal of Machine Learning Research
On the importance of data balancing for symbolic regression

IEEE Transactions on Evolutionary Computation
Measuring bloat, overfitting and functional complexity in genetic programming

Proceedings of the 12th annual conference on Genetic and evolutionary computation
Open issues in genetic programming

Genetic Programming and Evolvable Machines
Nonlinear regression model generation using hyperparameter optimization

Computers & Mathematics with Applications
Feature extraction from optimization data via DataModeler's ensemble symbolic regression

LION'10 Proceedings of the 4th international conference on Learning and intelligent optimization
Variance based selection to improve test set performance in genetic programming

Proceedings of the 13th annual conference on Genetic and evolutionary computation
Drawing boundaries: using individual evolved class boundaries for binary classification problems

Proceedings of the 13th annual conference on Genetic and evolutionary computation
Overfitting detection and adaptive covariant parsimony pressure for symbolic regression

Proceedings of the 13th annual conference companion on Genetic and evolutionary computation
A quantitative study of learning and generalization in genetic programming

EuroGP'11 Proceedings of the 14th European conference on Genetic programming
An empirical study of functional complexity as an indicator of overfitting in genetic programming

EuroGP'11 Proceedings of the 14th European conference on Genetic programming
Multi-objective genetic programming for visual analytics

EuroGP'11 Proceedings of the 14th European conference on Genetic programming
Comparison of experimental designs for simulation-based symbolic regression of manufacturing systems

Computers and Industrial Engineering
Learning a lot from only a little: genetic programming for panel segmentation on sparse sensory evaluation data

EuroGP'10 Proceedings of the 13th European conference on Genetic Programming
Random sampling technique for overfitting control in genetic programming

EuroGP'12 Proceedings of the 15th European conference on Genetic Programming
A library to run evolutionary algorithms in the cloud using mapreduce

EvoApplications'12 Proceedings of the 2012t European conference on Applications of Evolutionary Computation
Flex-GP: genetic programming on the cloud

EvoApplications'12 Proceedings of the 2012t European conference on Applications of Evolutionary Computation
Genetic programming needs better benchmarks

Proceedings of the 14th annual conference on Genetic and evolutionary computation
Parse-matrix evolution for symbolic regression

Engineering Applications of Artificial Intelligence
Knowledge discovery through symbolic regression with heuristiclab

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Controlling overfitting in symbolic regression based on a bias/variance error decomposition

PPSN'12 Proceedings of the 12th international conference on Parallel Problem Solving from Nature - Volume Part I
An alternative approach to avoid overfitting for surrogate models

Proceedings of the Winter Simulation Conference
Better GP benchmarks: community survey results and proposals

Genetic Programming and Evolvable Machines
An axiomatic model for concept structure description and its application to circuit design

Knowledge-Based Systems
Learning symbolic representations of hybrid dynamical systems

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a novel approach to generate data-driven regression models that not only give reliable prediction of the observed data but also have smoother response surfaces and extra generalization capabilities with respect to extrapolation. These models are obtained as solutions of a genetic programming (GP) process, where selection is guided by a tradeoff between two competing objectives--numerical accuracy and the order of nonlinearity. The latter is a novel complexity measure that adopts the notion of the minimal degree of the best-fit polynomial, approximating an analytical function with a certain precision. Using nine regression problems, this paper presents and illustrates two different strategies for the use of the order of nonlinearity in symbolic regression via GP. The combination of optimization of the order of nonlinearity together with the numerical accuracy strongly outperforms "conventional" optimization of a size-related expressional complexity and the accuracy with respect to extrapolative capabilities of solutions on all nine test problems. In addition to exploiting the new complexity measure, this paper also introduces a novel heuristic of alternating several optimization objectives in a 2-D optimization framework. Alternating the objectives at each generation in such a way allows us to exploit the effectiveness of 2-D optimization when more than two objectives are of interest (in this paper, these are accuracy, expressional complexity, and the order of nonlinearity). Results of the experiments on all test problems suggest that alternating the order of nonlinearity of GP individuals with their structural complexity produces solutions that are both compact and have smoother response surfaces, and, hence, contributes to better interpretability and understanding.