SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Evaluating the performance of cache-affinity scheduling in shared-memory multiprocessors
Journal of Parallel and Distributed Computing
Kernel-level scheduling for the nano-threads programming model
ICS '98 Proceedings of the 12th international conference on Supercomputing
Symbiotic jobscheduling for a simultaneous multithreaded processor
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Symbiotic jobscheduling with priorities for a simultaneous multithreading processor
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling
IEEE Transactions on Parallel and Distributed Systems
Maximizing Speedup through Self-Tuning of Processor Allocation
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
A Tool to Schedule Parallel Applications on Multiprocessors: The NANOS CPU MANAGER
IPDPS '00/JSSPP '00 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Performance-driven processor allocation
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
On mitigating memory bandwidth contention through bandwidth-aware scheduling
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Memory Latency Reduction via Thread Throttling
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
L1-bandwidth aware thread allocation in multicore SMT processors
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Hi-index | 0.00 |
In this paper we reformulate the thread scheduling problem on multiprogrammed SMPs Scheduling algorithms usually attempt to maximize performance of memory intensive applications by optimally exploiting the cache hierarchy We present experimental results indicating that – contrary to the common belief – the extent of performance loss of memory-intensive, multiprogrammed workloads is disproportionate to the deterioration of cache performance caused by interference between threads In previous work [1] we found that memory bandwidth saturation is often the actual bottleneck that determines the performance of multiprogrammed workloads Therefore, we present and evaluate two realistic scheduling policies which treat memory bandwidth as a first-class resource Their design methodology is general enough and can be applied to introduce bus bandwidth-awareness to conventional scheduling policies Experimental results substantiate the advantages of our approach.