Introduction to scientific workflow management and the Kepler system
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Pegasus: A framework for mapping complex scientific workflows onto distributed systems
Scientific Programming
Evaluating MapReduce for Multi-core and Multiprocessor Systems
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
Data mining using high performance data clouds: experimental studies using sector and sphere
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Wide-scale data stream management
ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
Got data?: a guide to data preservation in the information age
Communications of the ACM - Surviving the data deluge
Performance evaluation of virtual machine-based Grid workflow system
Concurrency and Computation: Practice & Experience - 2nd International Workshop on Workflow Management and Applications in Grid Environments (WaGe2007)
Mars: a MapReduce framework on graphics processors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Supporting MapReduce on large-scale asymmetric multi-core clusters
ACM SIGOPS Operating Systems Review
CLOUDLET: towards mapreduce implementation on virtual machines
Proceedings of the 18th ACM international symposium on High performance distributed computing
Programming Abstractions for Data Intensive Computing on Clouds and Grids
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
MapReduce Programming Model for .NET-Based Cloud Computing
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Recent Research Advances in e-Science
Cluster Computing
Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system
IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
FPMR: MapReduce framework on FPGA
Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Editorial: Special section: Federated resource management in grid and cloud computing systems
Future Generation Computer Systems
Misco: a MapReduce framework for mobile systems
Proceedings of the 3rd International Conference on PErvasive Technologies Related to Assistive Environments
MOON: MapReduce On Opportunistic eNvironments
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Virtual Data System on distributed virtual machines in computational grids
International Journal of Ad Hoc and Ubiquitous Computing
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Virtual workflow system for distributed collaborative scientific applications on Grids
Computers and Electrical Engineering
Towards building a cloud for scientific applications
Advances in Engineering Software
Massively Parallel Neural Signal Processing on a Many-Core Platform
Computing in Science and Engineering
Coordinated load management in Peer-to-Peer coupled federated grid systems
The Journal of Supercomputing
Hi-index | 0.00 |
Recently, the computational requirements for large-scale data-intensive analysis of scientific data have grown significantly. In High Energy Physics (HEP) for example, the Large Hadron Collider (LHC) produced 13 petabytes of data in 2010. This huge amount of data is processed on more than 140 computing centers distributed across 34 countries. The MapReduce paradigm has emerged as a highly successful programming model for large-scale data-intensive computing applications. However, current MapReduce implementations are developed to operate on single cluster environments and cannot be leveraged for large-scale distributed data processing across multiple clusters. On the other hand, workflow systems are used for distributed data processing across data centers. It has been reported that the workflow paradigm has some limitations for distributed data processing, such as reliability and efficiency. In this paper, we present the design and implementation of G-Hadoop, a MapReduce framework that aims to enable large-scale distributed computing across multiple clusters.