An Enabling Framework for Master-Worker Applications on the Computational Grid
HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
High Performance Parametric Modeling with Nimrod/G: Killer Application for the Global Grid?
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Matchmaking frameworks for distributed resource management
Matchmaking frameworks for distributed resource management
Faults in Grids: Why are they so bad and What can be done about it?
GRID '03 Proceedings of the 4th International Workshop on Grid Computing
Resource Management for Rapid Application Turnaround on Enterprise Desktop Grids
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Distributed computing in practice: the Condor experience: Research Articles
Concurrency and Computation: Practice & Experience - Grid Performance
Future Generation Computer Systems - Special issue: P2P computing and interaction with grids
High-Performance Task Distribution for Volunteer Computing
E-SCIENCE '05 Proceedings of the First International Conference on e-Science and Grid Computing
Scheduling task parallel applications for rapid turnaround on desktop grids
Scheduling task parallel applications for rapid turnaround on desktop grids
Personal adaptive clusters as containers for scientific jobs
Cluster Computing
Workflow task clustering for best effort systems with Pegasus
Proceedings of the 15th ACM Mardi Gras conference: From lightweight mash-ups to lambda grids: Understanding the spectrum of distributed computing requirements, applications, tools, infrastructures, interoperability, and the incremental adoption of key capabilities
Falkon: a Fast and Light-weight tasK executiON framework
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Resource allocation in grid computing
Journal of Scheduling
Efficient computation of sum-products on GPUs through software-managed cache
Proceedings of the 22nd annual international conference on Supercomputing
The performance of bags-of-tasks in large-scale distributed systems
HPDC '08 Proceedings of the 17th international symposium on High performance distributed computing
Nimrod/K: towards massively parallel dynamic grid workflows
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Fault-aware scheduling for Bag-of-Tasks applications on Desktop Grids
GRID '06 Proceedings of the 7th IEEE/ACM International Conference on Grid Computing
Improving MapReduce performance in heterogeneous environments
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Characterizing result errors in internet desktop grids
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Toward real-time, many-task applications on large distributed systems
EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Multi-scale analysis of large distributed computing systems
Proceedings of the third international workshop on Large-scale system and application performance
Towards a Powerful European DCI Based on Desktop Grids
Journal of Grid Computing
SpeQuloS: a QoS service for BoT applications using best effort distributed computing infrastructures
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Concurrency and Computation: Practice & Experience
A family of heuristics for agent-based elastic Cloud bag-of-tasks concurrent scheduling
Future Generation Computer Systems
Enhancing Federated Cloud Management with an Integrated Service Monitoring Approach
Journal of Grid Computing
SpeQuloS: a QoS service for hybrid and elastic computing infrastructures
Cluster Computing
Hi-index | 0.00 |
We present a holistic approach for efficient execution of bags-of-tasks (BOTs) on multiple grids, clusters, and volunteer computing grids virtualized as a single computing platform. The challenge is twofold: to assemble this compound environment and to employ it for execution of a mixture of throughput- and performance-oriented BOTs, with a dozen to millions of tasks each. Our generic mechanism allows per BOT specification of dynamic arbitrary scheduling and replication policies as a function of the system state, BOT execution state, and BOT priority. We implement our mechanism in the GridBot system and demonstrate its capabilities in a production setup. GridBot has executed hundreds of BOTs with over 9 million jobs during three months alone; these have been invoked on 25,000 hosts, 15,000 from the Superlink@Technion community grid and the rest from the Technion campus grid, local clusters, the Open Science Grid, EGEE, and the UW Madison pool.