GridBatch: Cloud Computing for Large-Scale Data-Intensive Batch Applications

Authors:
Huan Liu;Dan Orban
Affiliations:
-;-
Venue:
CCGRID '08 Proceedings of the 2008 Eighth IEEE International Symposium on Cluster Computing and the Grid
Year:
2008

Citing 0
Cited 9

One Program Model for Cloud Computing

CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
Integrating Cloud-Computing-Specific Model into Aircraft Design

CloudCom '09 Proceedings of the 1st International Conference on Cloud Computing
A data placement strategy in scientific cloud workflows

Future Generation Computer Systems
Adding semantics to software-as-a-service and cloud computing

WSEAS Transactions on Computers
CoHadoop: flexible data placement and its exploitation in Hadoop

Proceedings of the VLDB Endowment
CAD: an efficient data management and migration scheme across clouds for data-intensive scientific applications

Globe'11 Proceedings of the 4th international conference on Data management in grid and peer-to-peer systems
Web services based scheduling in OpenCF

The Journal of Supercomputing
Online optimization for scheduling preemptable tasks on IaaS cloud systems

Journal of Parallel and Distributed Computing
Input data organization for batch processing in time window based computations

Proceedings of the 28th Annual ACM Symposium on Applied Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

To be competitive, Enterprises are collecting and analyzing increasingly large amount of data in order to derive business insights. However, there are at least two challenges to meet the increasing demand. First, the growth in the amount of data far outpaces the computation power growth of a uniprocessor. The growing gap between the supply and demand of computation power forces Enterprises to parallelize their application code. Unfortunately,parallel programming is both time-consuming and error-prone. Second,the emerging Cloud Computing paradigm imposes constraints on the underlying infrastructure, which forces Enterprises to rethink their application architecture. We propose the GridBatch system, which aims at solving large-scale data-intensive batch problems under the Cloud infrastructure constraints. GridBatch is a programming model and associated library that hides the complexity of parallel programming,yet it gives the users complete control on how data are partitioned and how computation is distributed so that applications can have the highest performance possible. Through a real client example, we show that GridBatch achieves high performance in Amazon's EC2 computingCloud.