Scheduling Jobs on Parallel Systems Using a Relaxed Backfill Strategy
JSSPP '02 Revised Papers from the 8th International Workshop on Job Scheduling Strategies for Parallel Processing
Selective Buddy Allocation for Scheduling Parallel Jobs on Clusters
CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
Trace-driven co-simulation of high-performance computing systems using OMNeT++
Proceedings of the 2nd International Conference on Simulation Tools and Techniques
Effects of Job and Task Placement on Parallel Scientific Applications Performance
PDP '09 Proceedings of the 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing
Hi-index | 0.00 |
Inter-application network contention is seen as a major hurdle to achieve higher throughput in today's large-scale high-performance capacity systems. This effect is aggravated by current system schedulers that allocate jobs as soon as nodes become available, thus producing job fragmentation, i.e., the tasks of one job might be spread throughout the system instead of being allocated contiguously. This fragmentation increases the probability of sharing network resources with other applications, which produces higher inter-application network contention. In this paper, we propose the use of a contention-aware node allocation technique. This technique is based on identifying which applications are most prone to causing a big impact on inter-application contention and obtaining a more contiguous allocation for these particular workloads. We demonstrate that, although a contiguous node allocation on slimmed fat-tree topologies may increase intra-application contention, the reduction on inter-application contention is more significant. Simulation experiments on a 2,048-node system running multiple applications showed that this technique reduces contention time by up to 35%.