Mining the Genetic Program

  • Authors:
  • Walter Alden Tackett

  • Affiliations:
  • -

  • Venue:
  • IEEE Expert: Intelligent Systems and Their Applications
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

A major challenge in applying genetic programming to expert-system development is that the ubiquitous presence of irrelevant code makes a genetically induced program difficult to understand. The trait-mining technique extracts the expressions that comprise the program's salient problem elements.For many problems, there is insufficient expertise to construct a robust knowledge system, but extensive example cases are available. In those situations, a learning system can be trained to provide correct answers from the example cases. A drawback with the learning approach is that many real-world applications eschew the advice of "black boxes" and so require that the learning system provide some form of explanation.What learners are well suited to this? Connectionist methods are opaque with respect to explaining what they have learned about a problem. Closer to the symbolic domain are decision trees, from which a learner can derive a hierarchy of simple classification rules. More generally, the FOIL machine-learning approach produces symbolic declarative programs via a hill-climbing process. Related work with the FOCL system and other theory-guided learners integrates induction with preexisting expert knowledge.Genetic programming is a symbolic approach to induction, so it has the potential to feed knowledge of what it has learned back into the user environment. It applies the paradigm of evolutionary selection and husbandry to a population of programs over many generations to search for optimal solutions. Among its potential advantages is that genetic search, which is closely related to beam search, may be more powerful than the hill-climbing methods applied in the symbolic-learning approaches just mentioned.Angeline describes genetic algorithms as "strong weak" AI methods, which begin with no domain knowledge but over time synthesize emergent knowledge through empirical credit assignment to solution elements. He points out that genetic programming is a symbolic approach to GA methods, with a strong domain bias induced by the language elements chosen. In contrast to strong AI methods, which begin with large amounts of explicit domain knowledge, the method described in this article extracts, or mines, emergent knowledge from genetically induced programs.Because of their size and complexity, genetically induced programs for many problems do not yield readily to inspection. Angeline has pointed out that these properties of size and complexity may be desirable and even necessary for evolutionary search to proceed. Therefore, rather than enforcing an a priori parsimony of genetic programs, the mining approach determines saliency without intruding on the structure and process of the genetic-programming algorithm.The goal of mining salient expressions requires that programs yield symbolic information regardless of their size and complexity and that an automated process can estimate the importance of pieces of this information without prior human knowledge. The gene-banking approach taken here records all subexpressions that occur during the evolutionary process along with relevant statistics. A carefully tailored hashing and indexing scheme holds to a manageable level the time and space needed for tracking the large population of subexpressions. When the evolutionary process is complete, these traits and their statistics can be evaluated and reported in the order of their saliency.