Exploiting Parallelism in Knowledge Discovery Systems to Improve Scalability

  • Authors:
  • Gehad Galal;Diane J. Cook;Lawrence B. Holder

  • Affiliations:
  • -;-;-

  • Venue:
  • HICSS '98 Proceedings of the Thirty-First Annual Hawaii International Conference on System Sciences-Volume 5 - Volume 5
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

The large amount of data collected today is quickly overwhelming researchers' abilities to interpret the data and discover interesting patterns. Knowledge discovery and data mining approaches hold the potential to automate the interpretation process, but these approaches frequently utilize computationally expensive algorithms. In particular, scientific discovery systems focus on the utilization of richer data representation, sometimes without regard for scalability. This research outlines a general approach for scaling KDD systems using parallel and distributed resources and applies the suggested strategies to the Subdue knowledge discovery system. Subdue has been used to discover interesting and repetitive concepts in graph-based databases from a variety of domains, but requires a substantial amount of processing time. Experiments that demonstrate that scalability of parallel versions of the Subdue system are performed using CAD circuit databases and artificially-generated databases, and potential achievements and obstacles are discussed.