BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
A framework for clustering evolving data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Fast clustering using MapReduce
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the VLDB Endowment
Least squares quantization in PCM
IEEE Transactions on Information Theory
Hi-index | 0.00 |
MapReduce is a popular model in which the dataflow takes the form of a directed acyclic graph of operators. But it lacks built-in support for iterative programs, which arise naturally in many clustering applications. Based on micro-cluster and equivalence relation, we design a clustering algorithm which can be easily parallelized in MapReduce and done in quite a few MapReduce rounds. Experiments show that our algorithm not only runs fast and obtains good accuracy but also scales well and possesses high speedup.