Analyzing the EGEE Production Grid Workload: Application to Jobs Submission Optimization

Authors:
Diane Lingrand;Johan Montagnat;Janusz Martyniak;David Colling
Affiliations:
University of Nice - Sophia Antipolis / CNRS, France;University of Nice - Sophia Antipolis / CNRS, France;The Blackett Lab, Imperial College London, UK;The Blackett Lab, Imperial College London, UK
Venue:
Job Scheduling Strategies for Parallel Processing
Year:
2009

Citing 0
Cited 5

Dynamic scheduling of virtual machines running HPC workloads in scientific grids

NTMS'09 Proceedings of the 3rd international conference on New technologies, mobility and security
Modelling pilot-job applications on production grids

Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
A model of pilot-job resource provisioning on production grids

Parallel Computing
Self-Healing of Operational Workflow Incidents on Distributed Computing Infrastructures

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Self-healing of workflow activity incidents on distributed computing infrastructures

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Grids reliability remains an order of magnitude below clusters on production infrastructures. This work is aims at improving grid application performances by improving the job submission system. A stochastic model, capturing the behavior of a complex grid workload management system is proposed. To instantiate the model, detailed statistics are extracted from dense grid activity traces. The model is exploited in a simple job resubmission strategy. It provides quantitative inputs to improve job submission performance and it enables quantifying the impact of faults and outliers on grid operations.