Algorithms for clustering data
Algorithms for clustering data
Efficiency of hierarchic agglomerative clustering using the ICL distributed array processor
Journal of Documentation
Parallel algorithms for hierarchical clustering
Parallel Computing
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Halo World: Tools for Parallel Cluster Finding inAstrophysical N-body Simulations
Data Mining and Knowledge Discovery
Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values
Data Mining and Knowledge Discovery
A Fast Parallel Clustering Algorithm for Large Spatial Databases
Data Mining and Knowledge Discovery
The new k-windows algorithm for improving the k-means clustering algorithm
Journal of Complexity
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
The new k-windows algorithm for improving the k-means clustering algorithm
Journal of Complexity
Mining information users' knowledge for one-to-one marketing on information appliance
Expert Systems with Applications: An International Journal
A Load Balancing Knapsack Algorithm for Parallel Fuzzy c-Means Cluster Analysis
High Performance Computing for Computational Science - VECPAR 2008
Ontology-based data mining approach implemented for sport marketing
Expert Systems with Applications: An International Journal
Ontology-based data mining approach implemented on exploring product and brand spectrum
Expert Systems with Applications: An International Journal
Mining customer knowledge for tourism new product development and customer relationship management
Expert Systems with Applications: An International Journal
Parallel fuzzy c-means cluster analysis
VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
Clustering and ranking university majors using data mining and AHP algorithms: A case study in Iran
Expert Systems with Applications: An International Journal
Journal of Parallel and Distributed Computing
Hi-index | 0.11 |
Clustering algorithms require a large amount of computations of distances among patterns and centers of clusters. Hence, their complexity is dominated by the number of patterns. On the other hand, there is an explosive growth of business or scientific databases storing huge volumes of data. One of the main challenges of today's knowledge discovery systems is their ability to scale up to very large data sets. In this paper, we present a clustering methodology for scaling up any clustering algorithm. It is an iterative process that it is based on partitioning a sample of data into subsets. We, also, present extensive empirical tests that demonstrate the proposed methodology reduces the time complexity and at the same time may maintain the accuracy that would be achieved by a single clustering algorithm supplied with all the data.