Genetic programming for improved data mining: application to the biochemistry of protein interactions

  • Authors:
  • M. L. Raymer;W. F. Punch;E. D. Goodman;L. A. Kuhn

  • Affiliations:
  • Michigan State University, East Lansing, MI;Michigan State University, East Lansing, MI;Michigan State University, East Lansing, MI;Michigan State University, East Lansing, MI

  • Venue:
  • GECCO '96 Proceedings of the 1st annual conference on Genetic and evolutionary computation
  • Year:
  • 1996

Quantified Score

Hi-index 0.00

Visualization

Abstract

We have previously shown how a genetic algorithm (GA) can be used to perform "data mining," the discovery of particular/important data within large datasets, by finding optimal data classifications using known examples. However, these approaches, while successful, limited data relationships to those that were "fixed" before the GA run. We report here on an extension of our previous work, substituting a genetic program (GP) for a GA. The GP could optimize data classification, as did the GA, but could also determine the functional relationships among the features. This gave improved performance and new information on important relationships among features. We discuss the overall approach, and compare the effectiveness of the GA vs. GP on a biochemistry problem, the determination of the involvement of bound water molecules in protein interactions.