Implementing Malleability on MPI Jobs

Authors:
Gladys Utrera;Julita Corbalan;Jesus Labarta
Affiliations:
Universitat Politècnica de Catalunya (UPC);Universitat Politècnica de Catalunya (UPC);Universitat Politècnica de Catalunya (UPC)
Venue:
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Year:
2004

Citing 0
Cited 10

Another approach to backfilled jobs: applying virtual malleability to expired windows

Proceedings of the 19th annual international conference on Supercomputing
A simulator for adaptive parallel applications

Journal of Computer and System Sciences
A dynamic scheduler for balancing HPC applications

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Resource Allocation Using Virtual Clusters

CCGRID '09 Proceedings of the 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid
Dynamic load balancing in MPI jobs

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
A simulator for parallel applications with dynamically varying compute node allocation

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Supporting malleability in parallel architectures with dynamic CPUSETs mapping and dynamic MPI

ICDCN'10 Proceedings of the 11th international conference on Distributed computing and networking
Observations on MPI-2 support for hybrid master/slave applications in dynamic and heterogeneous environments

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Malleable Model Coupling with Prediction

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
A job scheduling approach for multi-core clusters based on virtual malleability

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Parallel jobs are characterized for having processes that communicate and synchronize with each other frequently. A processor allocation strategy widely used in parallel supercomputers is Space-Sharing, that is assigning a processors partition to each job for its exclusive use. In this article we present a global solution to offer virtual Malleability on message-passing parallel jobs, by applying a processor allocation strategy, the Folding by JobType (FJT). This technique is based on Folding and Moldability concepts and tries to decide the optimal initial number of processes, when to fold jobs and the number of folding times by analyzing the current and past system information. At processor level, we apply Co-Scheduling. We implement and evaluate the FJT under several workloads with different job sizes, classes and machine utilization. Results show that the FJT adapts easily to load changes, and can obtain better performance than the rest evaluated, on workloads with high coefficient variation and especially with burst arrivals.