On distributing the clustering process

Authors:
B. Boutsinas;T. Gnardellis
Affiliations:
Department of Business Administration, University of Patras, and University of Patras Artificial Intelligence Research Center (UPAIRC), and IS & AI Lab, Department of Computer Engineering and Info ...;IS & AI Lab, Department of Computer Engineering and Informatics, GR-26500 Patras, Greece
Venue:
Pattern Recognition Letters
Year:
2002

Citing 9
Cited 10

Algorithms for clustering data

Algorithms for clustering data
Efficiency of hierarchic agglomerative clustering using the ICL distributed array processor

Journal of Documentation
Parallel algorithms for hierarchical clustering

Parallel Computing
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Halo World: Tools for Parallel Cluster Finding inAstrophysical N-body Simulations

Data Mining and Knowledge Discovery
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values

Data Mining and Knowledge Discovery
A Fast Parallel Clustering Algorithm for Large Spatial Databases

Data Mining and Knowledge Discovery
The new k-windows algorithm for improving the k-means clustering algorithm

Journal of Complexity
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases

The new k-windows algorithm for improving the k-means clustering algorithm

Journal of Complexity
A parallel hierarchical clustering algorithm for PCs cluster system

Neurocomputing
Mining information users' knowledge for one-to-one marketing on information appliance

Expert Systems with Applications: An International Journal
A Load Balancing Knapsack Algorithm for Parallel Fuzzy c-Means Cluster Analysis

High Performance Computing for Computational Science - VECPAR 2008
Ontology-based data mining approach implemented for sport marketing

Expert Systems with Applications: An International Journal
Ontology-based data mining approach implemented on exploring product and brand spectrum

Expert Systems with Applications: An International Journal
Mining customer knowledge for tourism new product development and customer relationship management

Expert Systems with Applications: An International Journal
Parallel fuzzy c-means cluster analysis

VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
Clustering and ranking university majors using data mining and AHP algorithms: A case study in Iran

Expert Systems with Applications: An International Journal
Parallel WaveCluster: A linear scaling parallel clustering algorithm implementation with application to very large datasets

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.11

Visualization

Abstract

Clustering algorithms require a large amount of computations of distances among patterns and centers of clusters. Hence, their complexity is dominated by the number of patterns. On the other hand, there is an explosive growth of business or scientific databases storing huge volumes of data. One of the main challenges of today's knowledge discovery systems is their ability to scale up to very large data sets. In this paper, we present a clustering methodology for scaling up any clustering algorithm. It is an iterative process that it is based on partitioning a sample of data into subsets. We, also, present extensive empirical tests that demonstrate the proposed methodology reduces the time complexity and at the same time may maintain the accuracy that would be achieved by a single clustering algorithm supplied with all the data.