iMapReduce: A Distributed Computing Framework for Iterative Computation

Authors:
Yanfeng Zhang;Qixin Gao;Lixin Gao;Cuirong Wang
Affiliations:
School of Information Science and Engineering, Northeastern University, Shenyang, China 110819;Department of Electrical and Information Engineering, Northeastern University at Qinhuangdao, Qinhuangdao, China 066000;Department of Electrical and Computer Engineering, University of Massachusetts Amherst, Amherst, USA 01002;Department of Electrical and Information Engineering, Northeastern University at Qinhuangdao, Qinhuangdao, China 066000
Venue:
Journal of Grid Computing
Year:
2012

Citing 32
Cited 5

Handbook of mathematics (3rd ed.)

Handbook of mathematics (3rd ed.)
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Link prediction and path analysis using Markov chains

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Introduction to Algorithms

Introduction to Algorithms
Evaluating collaborative filtering recommender systems

ACM Transactions on Information Systems (TOIS)
The link-prediction problem for social networks

Journal of the American Society for Information Science and Technology
Dynamic personalized pagerank in entity-relation graphs

Proceedings of the 16th international conference on World Wide Web
Evaluating MapReduce for Multi-core and Multiprocessor Systems

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Statistical properties of community structure in large social and information networks

Proceedings of the 17th international conference on World Wide Web
Video suggestion and discovery for youtube: taking random walks through the view graph

Proceedings of the 17th international conference on World Wide Web
MapReduce for Data Intensive Scientific Analyses

ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
User interactions in social networks and their implications

Proceedings of the 4th ACM European conference on Computer systems
Scalable Collaborative Filtering Approaches for Large Recommender Systems

The Journal of Machine Learning Research
Pregel: a system for large-scale graph processing - "ABSTRACT"

Proceedings of the 28th ACM symposium on Principles of distributed computing
Power-Law Distributions in Empirical Data

SIAM Review
PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Stateful bulk processing for incremental analytics

Proceedings of the 1st ACM symposium on Cloud computing
Design patterns for efficient graph algorithms in MapReduce

Proceedings of the Eighth Workshop on Mining and Learning with Graphs
Twister: a runtime for iterative MapReduce

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
MapReduce online

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Improving MapReduce performance in heterogeneous environments

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Spark: cluster computing with working sets

HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Scripting the cloud with skywriting

HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
Asynchronous Algorithms in MapReduce

CLUSTER '10 Proceedings of the 2010 IEEE International Conference on Cluster Computing
HaLoop: efficient iterative data processing on large clusters

Proceedings of the VLDB Endowment
Large-scale incremental processing using distributed transactions and notifications

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Piccolo: building fast, distributed programs with partitioned tables

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
CIEL: a universal execution engine for distributed data-flow computing

Proceedings of the 8th USENIX conference on Networked systems design and implementation
PrIter: a distributed framework for prioritized iterative computations

Proceedings of the 2nd ACM Symposium on Cloud Computing
iMapReduce: A Distributed Computing Framework for Iterative Computation

IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
Windows Azure Platform

Windows Azure Platform

i2MapReduce: incremental iterative MapReduce

Proceedings of the 2nd International Workshop on Cloud Intelligence
The family of mapreduce and large-scale data processing systems

ACM Computing Surveys (CSUR)
Analysis of I/O Performance on an Amazon EC2 Cluster Compute and High I/O Platform

Journal of Grid Computing
Parallel processing of large graphs

Future Generation Computer Systems
Speeding-up codon analysis on the cloud with local MapReduce aggregation

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Iterative computation is pervasive in many applications such as data mining, web ranking, graph analysis, online social network analysis, and so on. These iterative applications typically involve massive data sets containing millions or billions of data records. This poses demand of distributed computing frameworks for processing massive data sets on a cluster of machines. MapReduce is an example of such a framework. However, MapReduce lacks built-in support for iterative process that requires to parse data sets iteratively. Besides specifying MapReduce jobs, users have to write a driver program that submits a series of jobs and performs convergence testing at the client. This paper presents iMapReduce, a distributed framework that supports iterative processing. iMapReduce allows users to specify the iterative computation with the separated map and reduce functions, and provides the support of automatic iterative processing within a single job. More importantly, iMapReduce significantly improves the performance of iterative implementations by (1) reducing the overhead of creating new MapReduce jobs repeatedly, (2) eliminating the shuffling of static data, and (3) allowing asynchronous execution of map tasks. We implement an iMapReduce prototype based on Apache Hadoop, and show that iMapReduce can achieve up to 5 times speedup over Hadoop for implementing iterative algorithms.