Free parallel data mining

  • Authors:
  • Bin Li;Dennis Shasha

  • Affiliations:
  • Department of Computer Science, Courant Institute of Mathematical Sciences, New York University;Department of Computer Science, Courant Institute of Mathematical Sciences, New York University

  • Venue:
  • SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data mining is computationally expensive. Since the benefits of data mining results are unpredictable, organizations may not be willing to buy new hardware for that purpose. We will present a system that enables data mining applications to run in parallel on networks of workstations in a fault-tolerant manner. We will describe our parallelization of a combinatorial pattern discovery algorithm and a classification tree algorithm. We will demonstrate the effectiveness of our system with two real applications: discovering active motifs in protein sequences and predicting foreign exchange rate movement.