G-Hadoop: MapReduce across distributed data centers for data-intensive computing

Authors:
Lizhe Wang;Jie Tao;Rajiv Ranjan;Holger Marten;Achim Streit;Jingying Chen;Dan Chen
Affiliations:
School of Computer, China University of Geosciences, PR China and Center for Earth Observation and Digital Earth, Chinese Academy of Sciences, PR China;Steinbuch Center for Computing, Karlsruhe Institute of Technology, Germany;ICT Centre, CSIRO, Australia;Steinbuch Center for Computing, Karlsruhe Institute of Technology, Germany;Steinbuch Center for Computing, Karlsruhe Institute of Technology, Germany;National Engineering Center for E-Learning, Central China Normal University, PR China;School of Computer, China University of Geosciences, PR China
Venue:
Future Generation Computer Systems
Year:
2013

Citing 27
Cited 0

Introduction to scientific workflow management and the Kepler system

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Pegasus: A framework for mapping complex scientific workflows onto distributed systems

Scientific Programming
Evaluating MapReduce for Multi-core and Multiprocessor Systems

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Data mining using high performance data clouds: experimental studies using sector and sphere

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Wide-scale data stream management

ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Got data?: a guide to data preservation in the information age

Communications of the ACM - Surviving the data deluge
Performance evaluation of virtual machine-based Grid workflow system

Concurrency and Computation: Practice & Experience - 2nd International Workshop on Workflow Management and Applications in Grid Environments (WaGe2007)
Mars: a MapReduce framework on graphics processors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Supporting MapReduce on large-scale asymmetric multi-core clusters

ACM SIGOPS Operating Systems Review
CLOUDLET: towards mapreduce implementation on virtual machines

Proceedings of the 18th ACM international symposium on High performance distributed computing
Programming Abstractions for Data Intensive Computing on Clouds and Grids

CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
MapReduce Programming Model for .NET-Based Cloud Computing

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Recent Research Advances in e-Science

Cluster Computing
Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
FPMR: MapReduce framework on FPGA

Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Editorial: Special section: Federated resource management in grid and cloud computing systems

Future Generation Computer Systems
Misco: a MapReduce framework for mobile systems

Proceedings of the 3rd International Conference on PErvasive Technologies Related to Assistive Environments
MOON: MapReduce On Opportunistic eNvironments

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
DryadLINQ: a system for general-purpose distributed data-parallel computing using a high-level language

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Virtual Data System on distributed virtual machines in computational grids

International Journal of Ad Hoc and Ubiquitous Computing
Taverna, reloaded

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Virtual workflow system for distributed collaborative scientific applications on Grids

Computers and Electrical Engineering
Towards building a cloud for scientific applications

Advances in Engineering Software
Massively Parallel Neural Signal Processing on a Many-Core Platform

Computing in Science and Engineering
Coordinated load management in Peer-to-Peer coupled federated grid systems

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, the computational requirements for large-scale data-intensive analysis of scientific data have grown significantly. In High Energy Physics (HEP) for example, the Large Hadron Collider (LHC) produced 13 petabytes of data in 2010. This huge amount of data is processed on more than 140 computing centers distributed across 34 countries. The MapReduce paradigm has emerged as a highly successful programming model for large-scale data-intensive computing applications. However, current MapReduce implementations are developed to operate on single cluster environments and cannot be leveraged for large-scale distributed data processing across multiple clusters. On the other hand, workflow systems are used for distributed data processing across data centers. It has been reported that the workflow paradigm has some limitations for distributed data processing, such as reliability and efficiency. In this paper, we present the design and implementation of G-Hadoop, a MapReduce framework that aims to enable large-scale distributed computing across multiple clusters.