Parallel k/h-Means Clustering for Large Data Sets

Authors:
Kilian Stoffel;Abdelkader Belkoniene
Affiliations:
-;-
Venue:
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Year:
1999

Citing 4
Cited 13

Efficiency of hierarchic agglomerative clustering using the ICL distributed array processor

Journal of Documentation
Parallel algorithms for hierarchical clustering

Parallel Computing
Knowledge discovery in databases terminology

Advances in knowledge discovery and data mining
Clustering Algorithms

Clustering Algorithms

Implementation Issues in the Design of I/O Intensive Data Mining Applications on Clusters of Workstations

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Scalable Parallel Clustering for Data Mining on Multicomputers

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
Parallel Fuzzy c-Means Clustering for Large Data Sets

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Fast Approximate Nearest-Neighbor Queries in Metric Feature Spaces by Buoy Indexing

VISUAL '02 Proceedings of the 5th International Conference on Recent Advances in Visual Information Systems
P-AutoClass: Scalable Parallel Clustering for Mining Large Data Sets

IEEE Transactions on Knowledge and Data Engineering
Agent-Based Non-distributed and Distributed Clustering

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Performance characterization of data mining benchmarks

Proceedings of the 2010 Workshop on Interaction between Compilers and Computer Architecture
Compiler and middleware support for scalable data mining

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
Evolutionary Rough Parallel Multi-Objective Optimization Algorithm

Fundamenta Informaticae
Parallelization of a hierarchical data clustering algorithm using OpenMP

IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Parallel k-means clustering algorithm on DNA dataset

PDCAT'04 Proceedings of the 5th international conference on Parallel and Distributed Computing: applications and Technologies
Performances of parallel clustering algorithm for categorical and mixed data

PDCAT'04 Proceedings of the 5th international conference on Parallel and Distributed Computing: applications and Technologies
Using Clustering and Metric Learning to Improve Science Return of Remote Sensed Imagery

ACM Transactions on Intelligent Systems and Technology (TIST)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the realization of a parallel version of the k/h-means clustering algorithm. This is one of the basic algorithms used in a wide range of data mining tasks. We show how a database can be distributed and how the algorithm can be applied to this distributed database. The tests conducted on a network of 32 PCs showed for large data sets a nearly ideal speedup.