Algorithms for clustering data
Algorithms for clustering data
Parallel algorithms for hierarchical clustering
Parallel Computing
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
From data mining to knowledge discovery: an overview
Advances in knowledge discovery and data mining
Bayesian classification (AutoClass): theory and results
Advances in knowledge discovery and data mining
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Models and languages for parallel computation
ACM Computing Surveys (CSUR)
ACM Computing Surveys (CSUR)
Mining Very Large Databases with Parallel Processing
Mining Very Large Databases with Parallel Processing
Isoefficiency: Measuring the Scalability of Parallel Algorithms and Architectures
IEEE Parallel & Distributed Technology: Systems & Technology
Modeling Communication Overhead: MPI and MPL Performance on the IBM SP2
IEEE Parallel & Distributed Technology: Systems & Technology
Bayesian Classification of Protein Structure
IEEE Expert: Intelligent Systems and Their Applications
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Parallel k/h-Means Clustering for Large Data Sets
Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
Evaluating and Modeling Communication Overhead of MPI Primitives on the Meiko CS-2
Proceedings of the 5th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Large-Scale Parallel Data Clustering
ICPR '96 Proceedings of the International Conference on Pattern Recognition (ICPR '96) Volume IV-Volume 7472 - Volume 7472
Parallel nearest neighbour clustering algorithm (PNNCA) for segmenting retinal blood vessels
PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
Image-mapped data clustering: An efficient technique for clustering large data sets
Intelligent Data Analysis
A new scalable and efficient parallel algorithm (PRACAL) for clustering large datasets
PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Computational Statistics & Data Analysis
Parallelization of a hierarchical data clustering algorithm using OpenMP
IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
DisClus: a distributed clustering technique over high resolution satellite data
ICDCN'10 Proceedings of the 11th international conference on Distributed computing and networking
Scalable co-clustering algorithms
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
A sample-based hierarchical adaptive K-means clustering method for large-scale video retrieval
Knowledge-Based Systems
Fast classification for large data sets via random selection clustering and Support Vector Machines
Intelligent Data Analysis
Hi-index | 0.00 |
Data clustering is an important task in the area of data mining. Clustering is the unsupervised classification of data items into homogeneous groups called clusters. Clustering methods partition a set of data items into clusters, such that items in the same cluster are more similar to each other than items in different clusters according to some defined criteria. Clustering algorithms are computationally intensive, particularly when they are used to analyze large amounts of data. A possible approach to reduce the processing time is based on the implementation of clustering algorithms on scalable parallel computers. This paper describes the design and implementation of P-AutoClass, a parallel version of the AutoClass system based upon the Bayesian model for determining optimal classes in large data sets. The P-AutoClass implementation divides the clustering task among the processors of a multicomputer so that each processor works on its own partition and exchanges intermediate results with the other processors. The system architecture, its implementation, and experimental performance results on different processor numbers and data sets are presented and compared with theoretical performance. In particular, experimental and predicted scalability and efficiency of P-AutoClass versus the sequential AutoClass system are evaluated and compared.