High performance clustering of social images in a map-collective programming model

Authors:
Bingjing Zhang;Judy Qiu
Affiliations:
Indiana University Bloomington;Indiana University Bloomington
Venue:
Proceedings of the 4th annual Symposium on Cloud Computing
Year:
2013

Citing 3
Cited 0

Twister: a runtime for iterative MapReduce

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Spark: cluster computing with working sets

HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Scalable parallel computing on clouds using Twister4Azure iterative MapReduce

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Large-scale iterative computations are common in many important data mining and machine learning algorithms. Most of these applications can be specified as iterations of MapReduce computations, leading to the Iterative MapReduce programming model [1] for efficient execution of data-intensive iterative computations interoperably between HPC and cloud environments. We observe that a systematic approach to collective communication is essential but notably missing in the current model. Thus we generalize the iterative MapReduce concept to Map-Collective on the premise that large collectives are a distinctive feature of data intensive and data mining applications. To show the necessity of Map-Collective model, this paper studies the implications of large-scale social image clustering problems, where 10--100 million images represented as points in a high dimensional (up to 2048) vector space are required to be divided into 1--10 million clusters.