Understanding application-level interoperability: Scaling-out MapReduce over high-performance grids and clouds

Authors:
Saurabh Sehgal;Miklos Erdelyi;Andre Merzky;Shantenu Jha
Affiliations:
Center for Computation & Technology, Louisiana State University, USA;Department of Computer Science & Systems Technology, University of Pannonia, Veszprem, Hungary and Computer & Automation Research Institute of the Hungarian Academy of Sciences, Hungary;Center for Computation & Technology, Louisiana State University, USA;Center for Computation & Technology, Louisiana State University, USA and Department of Computer Science, Louisiana State University, USA
Venue:
Future Generation Computer Systems
Year:
2011

Citing 5
Cited 6

The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Design and Implementation of Network Performance Aware Applications Using SAGA and Cactus

E-SCIENCE '07 Proceedings of the Third IEEE International Conference on e-Science and Grid Computing
Using clouds to provide grids with higher levels of abstraction and explicit support for usage modes

Concurrency and Computation: Practice & Experience - A Special Issue from the Open Grid Forum
Programming Abstractions for Data Intensive Computing on Clouds and Grids

CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid

Editorial: Special Section: Grid and Pervasive Computing 2009

Future Generation Computer Systems
Utility-driven adaptive query workload execution

Future Generation Computer Systems
Pilot-MapReduce: an extensible and flexible MapReduce implementation for distributed data

Proceedings of third international workshop on MapReduce and its Applications Date
Adapting scientific computing problems to clouds using MapReduce

Future Generation Computer Systems
Cloud MapReduce for Monte Carlo bootstrap applied to Metabolic Flux Analysis

Future Generation Computer Systems
Understanding mapreduce-based next-generation sequencing alignment on distributed cyberinfrastructure

Proceedings of the 3rd international workshop on Emerging computational methods for the life sciences

Quantified Score

Hi-index	0.00

Visualization

Abstract

Application-level interoperability is defined as the ability of an application to utilize multiple distributed heterogeneous resources. Such interoperability is becoming increasingly important with increasing volumes of data and multiple sources of data as well as resource types. The primary aim of this paper is to understand different ways and levels in which application-level interoperability can be provided across distributed infrastructure. Our approach is: (i) Given the simplicity of MapReduce, its widespread usage, and its ability to capture the primary challenges of developing distributed applications, use MapReduce as the underlying exemplar; we develop an interoperable implementation of MapReduce using SAGA - an API to support distributed programming, (ii) Using the canonical wordcount application that uses SAGA-based MapReduce, we investigate its scale-out across clusters, clouds and HPC resources, (iii) Establish the execution of wordcount application using MapReduce and other programming models such as Sphere concurrently. SAGA-based MapReduce in addition to being interoperable across different distributed infrastructures, also provides user-level control of the relative placement of compute and data. We provide performance measures and analysis of SAGA-MapReduce when using multiple, different, heterogeneous infrastructures concurrently for the same problem instance.