GridBot: execution of bags of tasks in multiple grids

Authors:
Mark Silberstein;Artyom Sharov;Dan Geiger;Assaf Schuster
Affiliations:
Israel Institute of Technology;Israel Institute of Technology;Israel Institute of Technology;Israel Institute of Technology
Venue:
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Year:
2009

Citing 19
Cited 8

An Enabling Framework for Master-Worker Applications on the Computational Grid

HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
High Performance Parametric Modeling with Nimrod/G: Killer Application for the Global Grid?

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Matchmaking frameworks for distributed resource management

Matchmaking frameworks for distributed resource management
Faults in Grids: Why are they so bad and What can be done about it?

GRID '03 Proceedings of the 4th International Workshop on Grid Computing
Resource Management for Rapid Application Turnaround on Enterprise Desktop Grids

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Distributed computing in practice: the Condor experience: Research Articles

Concurrency and Computation: Practice & Experience - Grid Performance
Computing on large-scale distributed systems: Xtrem Web architecture, programming models, security, tests and convergence with grid

Future Generation Computer Systems - Special issue: P2P computing and interaction with grids
High-Performance Task Distribution for Volunteer Computing

E-SCIENCE '05 Proceedings of the First International Conference on e-Science and Grid Computing
Scheduling task parallel applications for rapid turnaround on desktop grids

Scheduling task parallel applications for rapid turnaround on desktop grids
Personal adaptive clusters as containers for scientific jobs

Cluster Computing
Workflow task clustering for best effort systems with Pegasus

Proceedings of the 15th ACM Mardi Gras conference: From lightweight mash-ups to lambda grids: Understanding the spectrum of distributed computing requirements, applications, tools, infrastructures, interoperability, and the incremental adoption of key capabilities
Falkon: a Fast and Light-weight tasK executiON framework

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Resource allocation in grid computing

Journal of Scheduling
Efficient computation of sum-products on GPUs through software-managed cache

Proceedings of the 22nd annual international conference on Supercomputing
The performance of bags-of-tasks in large-scale distributed systems

HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Nimrod/K: towards massively parallel dynamic grid workflows

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Fault-aware scheduling for Bag-of-Tasks applications on Desktop Grids

GRID '06 Proceedings of the 7th IEEE/ACM International Conference on Grid Computing
Improving MapReduce performance in heterogeneous environments

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Characterizing result errors in internet desktop grids

Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing

Toward real-time, many-task applications on large distributed systems

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Multi-scale analysis of large distributed computing systems

Proceedings of the third international workshop on Large-scale system and application performance
Towards a Powerful European DCI Based on Desktop Grids

Journal of Grid Computing
SpeQuloS: a QoS service for BoT applications using best effort distributed computing infrastructures

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Detection and analysis of resource usage anomalies in large distributed systems through multi-scale visualization

Concurrency and Computation: Practice & Experience
A family of heuristics for agent-based elastic Cloud bag-of-tasks concurrent scheduling

Future Generation Computer Systems
Enhancing Federated Cloud Management with an Integrated Service Monitoring Approach

Journal of Grid Computing
SpeQuloS: a QoS service for hybrid and elastic computing infrastructures

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a holistic approach for efficient execution of bags-of-tasks (BOTs) on multiple grids, clusters, and volunteer computing grids virtualized as a single computing platform. The challenge is twofold: to assemble this compound environment and to employ it for execution of a mixture of throughput- and performance-oriented BOTs, with a dozen to millions of tasks each. Our generic mechanism allows per BOT specification of dynamic arbitrary scheduling and replication policies as a function of the system state, BOT execution state, and BOT priority. We implement our mechanism in the GridBot system and demonstrate its capabilities in a production setup. GridBot has executed hundreds of BOTs with over 9 million jobs during three months alone; these have been invoked on 25,000 hosts, 15,000 from the Superlink@Technion community grid and the rest from the Technion campus grid, local clusters, the Open Science Grid, EGEE, and the UW Madison pool.