Scalable Parallel Clustering for Data Mining on Multicomputers

Authors:
D. Foti;D. Lipari;Clara Pizzuti;Domenico Talia
Affiliations:
-;-;-;-
Venue:
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Year:
2000

Citing 6
Cited 6

Parallel algorithms for hierarchical clustering

Parallel Computing
Bayesian classification (AutoClass): theory and results

Advances in knowledge discovery and data mining
Mining Very Large Databases with Parallel Processing

Mining Very Large Databases with Parallel Processing
Bayesian Classification of Protein Structure

IEEE Expert: Intelligent Systems and Their Applications
Parallel k/h-Means Clustering for Large Data Sets

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Large-Scale Parallel Data Clustering

ICPR '96 Proceedings of the International Conference on Pattern Recognition (ICPR '96) Volume IV-Volume 7472 - Volume 7472

KNOWLEDGE GRID: High Performance Knowledge Discovery on the Grid

GRID '01 Proceedings of the Second International Workshop on Grid Computing
Parallelism in Knowledge Discovery Techniques

PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
Performance characterization of data mining benchmarks

Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture
Approximate kernel k-means: solution to large scale kernel clustering

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
A local facility location algorithm for sensor networks

DCOSS'05 Proceedings of the First IEEE international conference on Distributed Computing in Sensor Systems
Fault tolerant decentralised K-Means clustering for asynchronous large-scale networks

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the design and implementation on MIMD parallel machines of P-AutoClass, a parallel version of the AutoClass system based upon the Bayesian method for determining optimal classes in large datasets. The P-AutoClass implementation divides the clustering task among the processors of a multicomputer so that they work on their own partition and exchange their intermediate results. The system architecture, its implementation and experimental performance results on different processor numbers and dataset sizes are presented and discussed. In particular, efficiency and scalability of P-AutoClass versus the sequential AutoClass system are evaluated and compared.