Dynamic scheduling of virtual machines running HPC workloads in scientific grids
NTMS'09 Proceedings of the 3rd international conference on New technologies, mobility and security
Modelling pilot-job applications on production grids
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
A model of pilot-job resource provisioning on production grids
Parallel Computing
Self-Healing of Operational Workflow Incidents on Distributed Computing Infrastructures
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Self-healing of workflow activity incidents on distributed computing infrastructures
Future Generation Computer Systems
Hi-index | 0.00 |
Grids reliability remains an order of magnitude below clusters on production infrastructures. This work is aims at improving grid application performances by improving the job submission system. A stochastic model, capturing the behavior of a complex grid workload management system is proposed. To instantiate the model, detailed statistics are extracted from dense grid activity traces. The model is exploited in a simple job resubmission strategy. It provides quantitative inputs to improve job submission performance and it enables quantifying the impact of faults and outliers on grid operations.