Memory and Network Bandwidth Aware Scheduling of Multiprogrammed Workloads on Clusters of SMPs
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
User-guided symbiotic space-sharing of real workloads
Proceedings of the 20th annual international conference on Supercomputing
Symbiotic space-sharing on SDSC's datastar system
JSSPP'06 Proceedings of the 12th international conference on Job scheduling strategies for parallel processing
On mitigating memory bandwidth contention through bandwidth-aware scheduling
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Hi-index | 0.00 |
Main memory is typically significantly slower than the processors that use it. Such slow memory is then amortized by fast caches. Effective scheduling, particularly for soft or hard real-time, has therefore to include cache control, even on uniprocessors. Although cache scheduling is currently still an open research issue, we assume in this paper that uniprocessors are effectively schedulable in the presence of caches.In this paper, we focus on SMP-specific memory scheduling. A small SMP multiprocessor typically incorporates multiple processors (with per-processor caches) that work on a global main memory. Processors and main memory are connected by a single memory bus, i.e. all processors share the same bus.Assume that we have 4 job mixes that can be correctly scheduled on 4 independent uniprocessors. What happens if we put those 4 job mixes on a 4-processor system with a single memory bus? Without any additional scheduling provisions, the shared memory bus can, in the worst case, stretch each schedule by a factor of 4. This is clearly unacceptable. In general, it would mean that the real-time capacity of an n-processor system is only 1/n of the capacity of a uniprocessor system. Multiprocessors would be unusable for real-time applications.Therefore, memory-bus scheduling is desirable. Memory-bus scheduling should enable us to give soft and perhaps even hard guarantees in relation to memory bandwidth and latency to real-time applications. For non real-time applications, it should help optimize a system's overall throughput and/or latency.