Evaluation of reallocation heuristics for moldable tasks in computational grids

Authors:
Yves Caniou;Ghislain Charrier;Frédéric Desprez
Affiliations:
Université de Lyon, allée d'Italie, Lyon Cedex, France and UCBL, and CNRS (Jfli), Laboratoire de l'Informatique du Parallélisme (LIP), ÉNS Lyon, Lyon Cedex, France;Université de Lyon, Lyon Cedex, France and INRIA, ÉNS Lyon, Lyon Cedex, France;Université de Lyon, allée d'Italie, Lyon Cedex, France and INRIA, Laboratoire de l'Informatique du Parallélisme (LIP), ÉNS Lyon, allée d'Italie, Lyon Cedex, France
Venue:
AusPDC '11 Proceedings of the Ninth Australasian Symposium on Parallel and Distributed Computing - Volume 118
Year:
2011

Citing 16
Cited 0

Dynamic mapping of a class of independent tasks onto heterogeneous computing systems

Journal of Parallel and Distributed Computing - Special issue on software support for distributed computing
Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling

IEEE Transactions on Parallel and Distributed Systems
Using moldability to improve the performance of supercomputer jobs

Journal of Parallel and Distributed Computing
Ninf: A Network Based Information Library for Global World-Wide Computing Infrastructure

HPCN Europe '97 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
The ANL/IBM SP Scheduling System

IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Theory and Practice in Parallel Job Scheduling

IPPS '97 Proceedings of the Job Scheduling Strategies for Parallel Processing
Metrics and Benchmarking for Parallel Job Scheduling

IPPS/SPDP '98 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Core Algorithms of the Maui Scheduler

JSSPP '01 Revised Papers from the 7th International Workshop on Job Scheduling Strategies for Parallel Processing
Grid'5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed

International Journal of High Performance Computing Applications
SimGrid: A Generic Framework for Large-Scale Distributed Experiments

UKSIM '08 Proceedings of the Tenth International Conference on Computer Modeling and Simulation
Trace-based evaluation of job runtime and queue wait time predictions in grids

Proceedings of the 18th ACM international symposium on High performance distributed computing
Meta-scheduling and Task Reallocation in a Grid Environment

ADVCOMP '09 Proceedings of the 2009 Third International Conference on Advanced Engineering Computing and Applications in Sciences
Design and performance of a scheduling framework for resizable parallel applications

Parallel Computing
A job self-scheduling policy for HPC infrastructures

JSSPP'07 Proceedings of the 13th international conference on Job scheduling strategies for parallel processing
Analysis of Tasks Reallocation in a Dedicated Grid Environment

CLUSTER '10 Proceedings of the 2010 IEEE International Conference on Cluster Computing
Parallel job scheduling — a status report

JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Grid services often consist of remote sequential or rigid parallel application executions. However, moldable parallel applications, linear algebra solvers for example, are of great interest but requires dynamic tuning which has mostly to be done interactively if performances are needed. Thus, their grid execution depends on a remote and transparent submission to a possibly different batch scheduler on each site, and means an automatic tuning of the job according to the local load. In this paper we study the benefits of having a middleware able to automatically submit and reallocate requests from one site to another when it is also able to configure the services by tuning their number of processors and their walltime. In this context, we evaluate the benefits of such mechanisms on two multi-cluster Grid setups, where the platform is either composed of several heterogeneous dedicated clusters, or non dedicated ones. Different scenarios are explored using simulations of real cluster traces from different origins. Results show that a simple method is good and often the best. Indeed, it is faster and thus can take more jobs into account while having a small execution time. Moreover, users can expect more jobs finishing sooner and a gain on the average job response time between 10% and 40% in most cases if this reallocation mechanism combined to auto-tuning capabilities is implemented in a Grid framework. The implementation and the maintenance of this heuristic coupled to the migration mechanism in a Grid middleware is also simpler because less transfers are involved.