A fast algorithm for clustering with mapreduce

Authors:
Yuqing Miao;Jinxing Zhang;Hao Feng;Liangpei Qiu;Yimin Wen
Affiliations:
School of Computer science and Engineering, Guilin University of Electronic Technology, Guilin, China;School of Computer science and Engineering, Guilin University of Electronic Technology, Guilin, China;School of Computer science and Engineering, Guilin University of Electronic Technology, Guilin, China;School of Computer science and Engineering, Guilin University of Electronic Technology, Guilin, China;School of Computer science and Engineering, Guilin University of Electronic Technology, Guilin, China
Venue:
ISNN'13 Proceedings of the 10th international conference on Advances in Neural Networks - Volume Part I
Year:
2013

Citing 7
Cited 0

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
DisCo: Distributed Co-clustering with Map-Reduce: A Case Study towards Petabyte-Scale End-to-End Mining

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
Fast clustering using MapReduce

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Scalable k-means++

Proceedings of the VLDB Endowment
Least squares quantization in PCM

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

MapReduce is a popular model in which the dataflow takes the form of a directed acyclic graph of operators. But it lacks built-in support for iterative programs, which arise naturally in many clustering applications. Based on micro-cluster and equivalence relation, we design a clustering algorithm which can be easily parallelized in MapReduce and done in quite a few MapReduce rounds. Experiments show that our algorithm not only runs fast and obtains good accuracy but also scales well and possesses high speedup.