The GA-P: A Genetic Algorithm and Genetic Programming Hybrid

  • Authors:
  • Les M. Howard; Donna J. D'Angelo

  • Affiliations:
  • -;-

  • Venue:
  • IEEE Expert: Intelligent Systems and Their Applications
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

The GA-P performs symbolic regression by combining thetraditional genetic algorithm's function optimization strength withthe genetic-programming paradigm to evolve complex mathematicalexpressions capable of handling numeric and symbolic data. Thistechnique should provide new insights into poorly understood datarelationships.Discovering relationships has been a task troubling researcherssince the dawn of modern science. Discovering relationships betweensets of data is laborious and error prone, and it is highly subjectto researcher bias. Because many of today's research problems aremore complex than those of the past, it is increasingly importantthat robust data analysis methods be available to researchers. Fora data analysis method to be most useful, it must meet at leastthree criteria: good predictive ability, insight into the innerworkings of the system being analyzed, and unbiased results.Historically, researchers deduced relationships solely byexamining the data--a difficult task if the relationship iscomplex, if many variables are involved, or if the data are noisy(as often occurs in real-world problems). Moreover, the examinationis easily influenced by the researcher's desires andexpectations.Statistical methods were among the first tools developed to helpa researcher find the relationships of observed facts. Statisticalmethods are often based on such assumptions as these: (1) the dataare normally distributed, (2) the equation relating the data is ofa specific form (for example, linear, quadratic, or polynomial),and (3) the variables are independent. If the problem meets theseassumptions, statistics are a valuable tool for providing staticdescriptors. But real-world problems seldom meet thesecriteria.Neural networks, an artificial intelligence technique, are notlimited by these assumptions. They serve as strong predictivemodels that can uncover complex relationships, but they give littleinsight into the underlying mechanisms that describe arelationship. However, two other nonstatistical AI techniques,genetic algorithms and genetic programming, are more robust methodsof exploring complex solution spaces. Independently, they have hadsome success at revealing the mechanisms relating data items.Recently, genetic algorithms, which use the principles ofevolution through natural selection to solve problems, haveestablished themselves as a powerful search and optimizationtechnique. Most GAs are linear (the structure of an individual is aflat bit string). The basic GA proceeds as follows:Create a population of random individuals, in which eachindividual represents a possible solution to the problem athand.Evaluate each individual's fitness--its ability to solve thespecified problem.Select individual population members to be parents.Produce children by recombining parent material via crossoverand mutation, and add them to the population.Evaluate the children's fitness.Repeat steps 3-5 until a solution with the desired fitness goalis obtained.GAs have been used for everything from multiple-fault diagnosisto medical-image registration. They have shown themselves to be asuperior tool for developing rule-based systems, capable ofgleaning knowledge from data inaccessible to statistical methods.Goldberg thoroughly discusses genetic algorithms and their use as aproblem-solving and function optimization technique. Goldberg andForrest give additional examples.Although linear GAs are adept at developing rule-based systems,they cannot develop equations. A recent addition to theevolutionary domain is genetic programming, which uses anevolutionary approach to generate symbolic expressions and performsymbolic regressions. However, the genetic-programming method ofperforming symbolic regressions has some limitations. It can modifyonly the structure of an expression, not its contents, which isgenerated by the implementation program when the geneticprogramming starts. In performing symbolic regressions, geneticprogramming cannot deal with nonnumeric variables. It also tends toproduce convoluted equations because it cannot modify thecoefficients it uses (for example, a genetic program might use(2.523+2.523)/2.523 to represent the number 2).We have developed a method combining the known strengths oftraditional genetic algorithms with the new field of geneticprogramming to produce a superior tool for performing symbolicregressions. We call this tool the genetic algorithm-program, orthe GA-P.