Memory and Network Bandwidth Aware Scheduling of Multiprogrammed Workloads on Clusters of SMPs
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
User-guided symbiotic space-sharing of real workloads
Proceedings of the 20th annual international conference on Supercomputing
Addressing shared resource contention in multicore processors via scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Symbiotic space-sharing on SDSC's datastar system
JSSPP'06 Proceedings of the 12th international conference on Job scheduling strategies for parallel processing
Survey of scheduling techniques for addressing shared resources in multicore processors
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
Clusters of SMPs are becoming increasingly common. However, the shared memory design of SMPs and the consequential contention between system processors for access to main memory can limit their efficiency significantly. Moreover, the continuous improvement of modern cluster interconnection technologies leads to the network bandwidth being a significant fraction of the total memory bandwidth of the machine, thus the NIC of an SMP cluster node can also become a major consumer of shared memory bus bandwidth. In this paper we first provide experimental evidence that contention on the shared memory bus can have major impact on the total execution time of processes even when no processor sharing is involved, then present the design and implementation of an informed scheduling algorithm for multiprogrammed workloads, which tries to carefully select processes to be co-scheduled so that bus saturation is avoided. The input data needed by our scheduler are acquired dynamically, at run-time, using architecture-specific performance monitoring counters and a modified version of the NIC firmware, with no changes to existing application binaries. Experimental comparison between our scheduler and the standard Linux 2.6 O(1) scheduler shows average system throughput improvements in the range of 5-25%.