PrIter: a distributed framework for prioritized iterative computations

Authors:
Yanfeng Zhang;Qixin Gao;Lixin Gao;Cuirong Wang
Affiliations:
Northeastern University, China and University of Massachusetts Amherst;Northeastern University, China;University of Massachusetts Amherst;Northeastern University, China
Venue:
Proceedings of the 2nd ACM Symposium on Cloud Computing
Year:
2011

Citing 25
Cited 14

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Unsupervised document classification using sequential information maximization

SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
The link-prediction problem for social networks

Journal of the American Society for Information Science and Technology
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Video suggestion and discovery for youtube: taking random walks through the view graph

Proceedings of the 17th international conference on World Wide Web
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
User interactions in social networks and their implications

Proceedings of the 4th ACM European conference on Computer systems
A comparison of approaches to large-scale data analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Scalable proximity estimation and link prediction in online social networks

Proceedings of the 9th ACM SIGCOMM conference on Internet measurement conference
PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Hive: a warehousing solution over a map-reduce framework

Proceedings of the VLDB Endowment
Stateful bulk processing for incremental analytics

Proceedings of the 1st ACM symposium on Cloud computing
Comet: batched stream processing for data intensive distributed computing

Proceedings of the 1st ACM symposium on Cloud computing
Pregel: a system for large-scale graph processing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Twister: a runtime for iterative MapReduce

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Improving MapReduce performance in heterogeneous environments

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Spark: cluster computing with working sets

HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
HaLoop: efficient iterative data processing on large clusters

Proceedings of the VLDB Endowment
Large-scale incremental processing using distributed transactions and notifications

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Piccolo: building fast, distributed programs with partitioned tables

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
CIEL: a universal execution engine for distributed data-flow computing

Proceedings of the 8th USENIX conference on Networked systems design and implementation
iMapReduce: A Distributed Computing Framework for Iterative Computation

IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum

iMapReduce: A Distributed Computing Framework for Iterative Computation

Journal of Grid Computing
Distributed GraphLab: a framework for machine learning and data mining in the cloud

Proceedings of the VLDB Endowment
Accelerate large-scale iterative computation through asynchronous accumulative updates

Proceedings of the 3rd workshop on Scientific Cloud Computing Date
The seven deadly sins of cloud computing research

HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Oolong: asynchronous distributed applications made easy

Proceedings of the Asia-Pacific Workshop on Systems
Oolong: asynchronous distributed applications made easy

APSys'12 Proceedings of the Third ACM SIGOPS Asia-Pacific conference on Systems
Efficient analytics on ordered datasets using MapReduce

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
WTF: the who to follow service at Twitter

Proceedings of the 22nd international conference on World Wide Web
Mammoth: autonomic data processing framework for scientific state-transition applications

Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference
i2MapReduce: incremental iterative MapReduce

Proceedings of the 2nd International Workshop on Cloud Intelligence
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
Naiad: a timely dataflow system

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
Fast iterative graph computation with block updates

Proceedings of the VLDB Endowment
A Scalable Distributed Framework for Efficient Analytics on Ordered Datasets

UCC '13 Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Iterative computations are pervasive among data analysis applications in the cloud, including Web search, online social network analysis, recommendation systems, and so on. These cloud applications typically involve data sets of massive scale. Fast convergence of the iterative computation on the massive data set is essential for these applications. In this paper, we explore the opportunity for accelerating iterative computations and propose a distributed computing framework, PrIter, which enables fast iterative computation by providing the support of prioritized iteration. Instead of performing computations on all data records without discrimination, PrIter prioritizes the computations that help convergence the most, so that the convergence speed of iterative process is significantly improved. We evaluate PrIter on a local cluster of machines as well as on Amazon EC2 Cloud. The results show that PrIter achieves up to 50x speedup over Hadoop for a series of iterative algorithms.