Modelling pilot-job applications on production grids

Authors:
Tristan Glatard;Sorina Camarasu-Pop
Affiliations:
University of Lyon, CNRS, INSERM, CREATIS, Villeurbanne, France;University of Lyon, CNRS, INSERM, CREATIS, Villeurbanne, France
Venue:
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Year:
2009

Citing 10
Cited 3

Probabilistic and Dynamic Optimization of Job Partitioning on a Grid Infrastructure

PDP '06 Proceedings of the 14th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing
A large-scale study of failures in high-performance computing systems

DSN '06 Proceedings of the International Conference on Dependable Systems and Networks
Optimizing jobs timeouts on clusters and production grids

CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Impact of the execution context on Grid job performances

CCGRID '07 Proceedings of the Seventh IEEE International Symposium on Cluster Computing and the Grid
Exploring event correlation for failure prediction in coalitions of clusters

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Improvement of Task Retrieval Performance Using AMGA in a Large-Scale Virtual Screening

NCM '08 Proceedings of the 2008 Fourth International Conference on Networked Computing and Advanced Information Management - Volume 01
Towards Making BOINC and EGEE Interoperable

ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Resource Provisioning Options for Large-Scale Scientific Workflows

ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Modeling Job Arrival Process with Long Range Dependence and Burstiness Characteristics

CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Analyzing the EGEE Production Grid Workload: Application to Jobs Submission Optimization

Job Scheduling Strategies for Parallel Processing

Processing moldable tasks on the grid: Late job binding with lightweight user-level overlay

Future Generation Computer Systems
A model of pilot-job resource provisioning on production grids

Parallel Computing
A survey of task mapping on production grids

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Pilot-job systems have emerged as a computation paradigm to cope with heterogeneity of production grids, greatly improving fault ratios and latency. Tools like DIANE, WISDOM-II, ToPoS and Condor glideIns are now being widely adopted to conduct large-scale experiments on such platforms. However, a model of pilot-job applications is still lacking, making it difficult to determine submission parameters such as the number of pilots to submit to achieve a given performance level. The variability of production conditions and the heterogeneity of the underlying middleware and infrastructure further complicates this issue. This paper presents a performance model for pilot-job applications running on production grids. Based on a probabilistic modelling, we derive statistics about the number of available pilots along time and the makespan of the application given the number of submitted pilots. Results obtained on a radiotherapy application running on the EGEE production grid show that the model is accurate enough to correctly describe the behavior of the application, setting the basis for further optimization strategies.