Pareto front genetic programming parameter selection based on design of experiments and industrial data

  • Authors:
  • Flor Castillo;Arthur Kordon;Guido Smits;Ben Christenson;Dee Dickerson

  • Affiliations:
  • The Dow Chemical Company, Freeport, TX;The Dow Chemical Company, Freeport, TX;Dow Benelux, B.V., Terneuzen, The Netherlands;The Dow Chemical Company, Freeport, TX;The Dow Chemical Company, Freeport, TX

  • Venue:
  • Proceedings of the 8th annual conference on Genetic and evolutionary computation
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Symbolic regression based on Pareto Front GP is the key approach for generating high-performance parsimonious empirical models acceptable for industrial applications. The paper addresses the issue of finding the optimal parameter settings of Pareto Front GP which direct the simulated evolution toward simple models with acceptable prediction error. A generic methodology based on statistical design of experiments is proposed. It includes statistical determination of the number of replicates by half-width confidence intervals, determination of the significant inputs by fractional factorial design of experiments, approaching the optimum by steepest ascent/descent, and local exploration around the optimum by Box Behnken or by central composite design of experiments. The results from implementing the proposed methodology to a small-sized industrial data set show that the statistically significant factors for symbolic regression, based on Pareto Front GP, are the number of cascades, the number of generations, and the population size. A second order regression model with high R2 of 0.97 includes the three parameters and their optimal values have been defined. The optimal parameter settings were validated with a separate small sized industrial data set. The optimal settings are recommended for symbolic regression applications using data sets with up to 5 inputs and up to 50 data points.