PrIter: a distributed framework for prioritized iterative computations
Proceedings of the 2nd ACM Symposium on Cloud Computing
iMapReduce: A Distributed Computing Framework for Iterative Computation
Journal of Grid Computing
MapIterativeReduce: a framework for reduction-intensive data processing on azure clouds
Proceedings of third international workshop on MapReduce and its Applications Date
Accelerate large-scale iterative computation through asynchronous accumulative updates
Proceedings of the 3rd workshop on Scientific Cloud Computing Date
The seven deadly sins of cloud computing research
HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
MRKDSBC: a distributed background modeling algorithm based on mapreduce
ISNN'12 Proceedings of the 9th international conference on Advances in Neural Networks - Volume Part I
Efficient analytics on ordered datasets using MapReduce
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
GPS: a graph processing system
Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Mammoth: autonomic data processing framework for scientific state-transition applications
Proceedings of the 2013 ACM Cloud and Autonomic Computing Conference
Data-Intensive Cloud Computing: Requirements, Expectations, Challenges, and Solutions
Journal of Grid Computing
A Scalable Distributed Framework for Efficient Analytics on Ordered Datasets
UCC '13 Proceedings of the 2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing
Hi-index | 0.00 |
Relational data are pervasive in many applications such as data mining or social network analysis. These relational data are typically massive containing at least millions or hundreds of millions of relations. This poses demand for the design of distributed computing frameworks for processing these data on a large cluster. MapReduce is an example of such a framework. However, many relational data based applications typically require parsing the relational data iteratively and need to operate on these data through many iterations. MapReduce lacks built-in support for the iterative process. This paper presents iMapReduce, a framework that supports iterative processing. iMapReduce allows users to specify the iterative operations with map and reduce functions, while supporting the iterative processing automatically without the need of users' involvement. More importantly, iMapReduce significantly improves the performance of iterative algorithms by (1) reducing the overhead of creating a new task in every iteration, (2) eliminating the shuffling of the static data in the shuffle stage of MapReduce, and (3) allowing asynchronous execution of each iteration, {it i.e.,} an iteration can start before all tasks of a previous iteration have finished. We implement iMapReduce based on Apache Hadoop, and show that iMapReduce can achieve a factor of 1.2 to 5 speedup over those implemented on MapReduce for well-known iterative algorithms.