Master/Slave Computing on the Grid
HCW '00 Proceedings of the 9th Heterogeneous Computing Workshop
DIRAC: A Scalable Lightweight Architecture for High Throughput Computing
GRID '04 Proceedings of the 5th IEEE/ACM International Workshop on Grid Computing
A large-scale study of failures in high-performance computing systems
DSN '06 Proceedings of the International Conference on Dependable Systems and Networks
A provisioning model and its comparison with best-effort for performance-cost optimization in grids
Proceedings of the 16th international symposium on High performance distributed computing
Future Generation Computer Systems
Exploring event correlation for failure prediction in coalitions of clusters
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Falkon: a Fast and Light-weight tasK executiON framework
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
SimGrid: A Generic Framework for Large-Scale Distributed Experiments
UKSIM '08 Proceedings of the Tenth International Conference on Computer Modeling and Simulation
Improvement of Task Retrieval Performance Using AMGA in a Large-Scale Virtual Screening
NCM '08 Proceedings of the 2008 Fourth International Conference on Networked Computing and Advanced Information Management - Volume 01
Towards Making BOINC and EGEE Interoperable
ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
Toward autonomic grids: analyzing the job flow with affinity streaming
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Modeling Job Arrival Process with Long Range Dependence and Burstiness Characteristics
CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Analyzing the EGEE Production Grid Workload: Application to Jobs Submission Optimization
Job Scheduling Strategies for Parallel Processing
Modelling pilot-job applications on production grids
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Modeling resubmission in unreliable grids: the bottom-up approach
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
The master-slave paradigm with heterogeneous processors
IEEE Transactions on Parallel and Distributed Systems
Hi-index | 0.00 |
Pilot-job systems emerged as a computation paradigm to cope with the heterogeneity of large-scale production grids, greatly reducing fault ratios and middleware overheads. They are now widely adopted to sustain the computation of scientific applications on such platforms. However, a model of pilot-job systems is still lacking, making it difficult to build realistic experimental setups for their study (e.g. simulators or controlled platforms). The variability of production conditions, background loads and resource characteristics further complicate this issue. This paper presents a model of pilot-job resource provisioning. Based on a probabilistic modeling of pilot submission and registration, the number of pilots registered to the application host and the makespan of a divisible-load application are derived. The model takes into account job failures and it does not make any assumption on the characteristics of the computing resources, on the scheduling algorithm or on the background load. Only a minimally invasive monitoring of the grid is required. The model is evaluated in production conditions, using logs acquired on a pilot-job server deployed in the biomed virtual organization of the European Grid Infrastructure. Experimental results show that the model is able to accurately describe the number of registered pilots along time periods ranging from a few hours to a few days and in different pilot submission conditions.