iMapReduce: A Distributed Computing Framework for Iterative Computation

Authors:
Yanfeng Zhang;Qinxin Gao;Lixin Gao;Cuirong Wang
Affiliations:
-;-;-;-
Venue:
IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
Year:
2011

Citing 0
Cited 11

PrIter: a distributed framework for prioritized iterative computations

Proceedings of the 2nd ACM Symposium on Cloud Computing
iMapReduce: A Distributed Computing Framework for Iterative Computation

Journal of Grid Computing
MapIterativeReduce: a framework for reduction-intensive data processing on azure clouds

Proceedings of third international workshop on MapReduce and its Applications Date
Accelerate large-scale iterative computation through asynchronous accumulative updates

Proceedings of the 3rd workshop on Scientific Cloud Computing Date
The seven deadly sins of cloud computing research

HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
MRKDSBC: a distributed background modeling algorithm based on mapreduce

ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part I
Efficient analytics on ordered datasets using MapReduce

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
GPS: a graph processing system

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Mammoth: autonomic data processing framework for scientific state-transition applications

Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference
Data-Intensive Cloud Computing: Requirements, Expectations, Challenges, and Solutions

Journal of Grid Computing
A Scalable Distributed Framework for Efficient Analytics on Ordered Datasets

UCC '13 Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Relational data are pervasive in many applications such as data mining or social network analysis. These relational data are typically massive containing at least millions or hundreds of millions of relations. This poses demand for the design of distributed computing frameworks for processing these data on a large cluster. MapReduce is an example of such a framework. However, many relational data based applications typically require parsing the relational data iteratively and need to operate on these data through many iterations. MapReduce lacks built-in support for the iterative process. This paper presents iMapReduce, a framework that supports iterative processing. iMapReduce allows users to specify the iterative operations with map and reduce functions, while supporting the iterative processing automatically without the need of users' involvement. More importantly, iMapReduce significantly improves the performance of iterative algorithms by (1) reducing the overhead of creating a new task in every iteration, (2) eliminating the shuffling of the static data in the shuffle stage of MapReduce, and (3) allowing asynchronous execution of each iteration, {it i.e.,} an iteration can start before all tasks of a previous iteration have finished. We implement iMapReduce based on Apache Hadoop, and show that iMapReduce can achieve a factor of 1.2 to 5 speedup over those implemented on MapReduce for well-known iterative algorithms.