Scalable parallel computing on clouds using Twister4Azure iterative MapReduce

Authors:
Thilina Gunarathne;Bingjing Zhang;Tak-Lon Wu;Judy Qiu
Affiliations:
-;-;-;-
Venue:
Future Generation Computer Systems
Year:
2013

Citing 13
Cited 1

MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Dimension reduction and visualization of large high-dimensional data via interpolation

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
AzureBlast: a case study of developing science applications on the cloud

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Twister: a runtime for iterative MapReduce

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Data-Intensive Text Processing with MapReduce

Data-Intensive Text Processing with MapReduce
MapReduce online

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Spark: cluster computing with working sets

HotCloud'10 Proceedings of the 2nd USENIX conference on Hot topics in cloud computing
MapReduce in the Clouds for Science

CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Applying Twister to Scientific Applications

CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Cloud Technologies for Bioinformatics Applications

IEEE Transactions on Parallel and Distributed Systems
Cloud MapReduce: A MapReduce Implementation on Top of a Cloud Operating System

CCGRID '11 Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
Cloud computing paradigms for pleasingly parallel biomedical applications

Concurrency and Computation: Practice & Experience
Least squares quantization in PCM

IEEE Transactions on Information Theory

High performance clustering of social images in a map-collective programming model

Proceedings of the 4th annual Symposium on Cloud Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent advances in data-intensive computing for science discovery are fueling a dramatic growth in the use of data-intensive iterative computations. The utility computing model introduced by cloud computing, combined with the rich set of cloud infrastructure and storage services, offers a very attractive environment in which scientists can perform data analytics. The challenges to large-scale distributed computations on cloud environments demand innovative computational frameworks that are specifically tailored for cloud characteristics to easily and effectively harness the power of clouds. Twister4Azure is a distributed decentralized iterative MapReduce runtime for Windows Azure Cloud. Twister4Azure extends the familiar, easy-to-use MapReduce programming model with iterative extensions, enabling a fault-tolerance execution of a wide array of data mining and data analysis applications on the Azure cloud. Twister4Azure utilizes the scalable, distributed and highly available Azure cloud services as the underlying building blocks, and employs a decentralized control architecture that avoids single point failures. Twister4Azure optimizes the iterative computations using a multi-level caching of data, a cache-aware decentralized task scheduling, hybrid tree-based data broadcasting and hybrid intermediate data communication. This paper presents the Twister4Azure iterative MapReduce runtime and a study of four real world data-intensive scientific applications implemented using Twister4Azure-two iterative applications, Multi-Dimensional Scaling and KMeans Clustering; and two pleasingly parallel applications, BLAST+ sequence searching and SmithWaterman sequence alignment. Performance measurements show comparable or a factor of 2 to 4 better results than the traditional MapReduce runtimes deployed on up to 256 instances and for jobs with tens of thousands of tasks. We also study and present solutions to several factors that affect the performance of iterative MapReduce applications on Windows Azure Cloud.