Contention-aware node allocation policy for high-performance capacity systems

Authors:
Ana Jokanovic;Cyriel Minkenberg;Jose Carlos Sancho;Ramon Beivide;German Rodriguez;Jesus Labarta
Affiliations:
Barcelona, Supercomputing Center, Barcelona, Spain;IBM Research -- Zurich, Rüschlikon, Switzerland;Barcelona, Supercomputing Center, Barcelona, Spain;University of Cantabria, Santander, Spain;IBM Research -- Zurich, Rúschlikon, Switzerland;Barcelona, Supercomputing Center, Barcelona, Spain
Venue:
Proceedings of the 2012 Interconnection Network Architecture: On-Chip, Multi-Chip Workshop
Year:
2012

Citing 4
Cited 0

Scheduling Jobs on Parallel Systems Using a Relaxed Backfill Strategy

JSSPP '02 Revised Papers from the 8th International Workshop on Job Scheduling Strategies for Parallel Processing
Selective Buddy Allocation for Scheduling Parallel Jobs on Clusters

CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
Trace-driven co-simulation of high-performance computing systems using OMNeT++

Proceedings of the 2nd International Conference on Simulation Tools and Techniques
Effects of Job and Task Placement on Parallel Scientific Applications Performance

PDP '09 Proceedings of the 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Inter-application network contention is seen as a major hurdle to achieve higher throughput in today's large-scale high-performance capacity systems. This effect is aggravated by current system schedulers that allocate jobs as soon as nodes become available, thus producing job fragmentation, i.e., the tasks of one job might be spread throughout the system instead of being allocated contiguously. This fragmentation increases the probability of sharing network resources with other applications, which produces higher inter-application network contention. In this paper, we propose the use of a contention-aware node allocation technique. This technique is based on identifying which applications are most prone to causing a big impact on inter-application contention and obtaining a more contiguous allocation for these particular workloads. We demonstrate that, although a contiguous node allocation on slimmed fat-tree topologies may increase intra-application contention, the reduction on inter-application contention is more significant. Simulation experiments on a 2,048-node system running multiple applications showed that this technique reduces contention time by up to 35%.