Free parallel data mining

Authors:
Bin Li;Dennis Shasha
Affiliations:
Department of Computer Science, Courant Institute of Mathematical Sciences, New York University;Department of Computer Science, Courant Institute of Mathematical Sciences, New York University
Venue:
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Year:
1998

Citing 5
Cited 0

Linda in context

Communications of the ACM
C4.5: programs for machine learning

C4.5: programs for machine learning
Combinatorial pattern discovery for scientific data: some preliminary results

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Persistant Linda: Linda + Transactions + Query Processing

Research Directions in High-Level Parallel Programming Languages
An Approach to Fault-tolerant Parallel Processing on Intermittently Idle, Heterogeneous Workstations

FTCS '97 Proceedings of the 27th International Symposium on Fault-Tolerant Computing (FTCS '97)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data mining is computationally expensive. Since the benefits of data mining results are unpredictable, organizations may not be willing to buy new hardware for that purpose. We will present a system that enables data mining applications to run in parallel on networks of workstations in a fault-tolerant manner. We will describe our parallelization of a combinatorial pattern discovery algorithm and a classification tree algorithm. We will demonstrate the effectiveness of our system with two real applications: discovering active motifs in protein sequences and predicting foreign exchange rate movement.