SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Exploring the design space for a shared-cache multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Evaluating the performance of cache-affinity scheduling in shared-memory multiprocessors
Journal of Parallel and Distributed Computing
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Thread scheduling for cache locality
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Kernel-level scheduling for the nano-threads programming model
ICS '98 Proceedings of the 12th international conference on Supercomputing
Performance counters and state sharing annotations: a unified approach to thread locality
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Analytical cache models with applications to cache partitioning
ICS '01 Proceedings of the 15th international conference on Supercomputing
Symbiotic jobscheduling for a simultaneous multithreaded processor
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
An analysis of operating system behavior on a simultaneous multithreaded architecture
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Symbiotic jobscheduling with priorities for a simultaneous multithreading processor
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Using Processor-Cache Affinity Information in Shared-Memory Multiprocessor Scheduling
IEEE Transactions on Parallel and Distributed Systems
Informing Algorithms for Efficient Scheduling of Synchronizing Threads on Multiprogrammed SMPs
ICPP '02 Proceedings of the 2001 International Conference on Parallel Processing
Effects of Memory Performance on Parallel Job Scheduling
JSSPP '01 Revised Papers from the 7th International Workshop on Job Scheduling Strategies for Parallel Processing
Predictable performance in SMT processors
Proceedings of the 1st conference on Computing frontiers
Effectively sharing a cache among threads
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Organizing the Last Line of Defense before Hitting the Memory Wall for CMPs
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
Performance-driven processor allocation
OSDI'00 Proceedings of the 4th conference on Symposium on Operating System Design & Implementation - Volume 4
User-guided symbiotic space-sharing of real workloads
Proceedings of the 20th annual international conference on Supercomputing
Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Resource-conscious scheduling for energy efficiency on multicore processors
Proceedings of the 5th European conference on Computer systems
Symbiotic space-sharing on SDSC's datastar system
JSSPP'06 Proceedings of the 12th international conference on Job scheduling strategies for parallel processing
An approach to resource-aware co-scheduling for CMPs
Proceedings of the 24th ACM International Conference on Supercomputing
Dynamic workload characterization for power efficient scheduling on CMP systems
Proceedings of the 16th ACM/IEEE international symposium on Low power electronics and design
On mitigating memory bandwidth contention through bandwidth-aware scheduling
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Memory-aware scheduling for energy efficiency on multicore processors
HotPower'08 Proceedings of the 2008 conference on Power aware computing and systems
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Optimal task assignment in multithreaded processors: a statistical approach
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Survey of scheduling techniques for addressing shared resources in multicore processors
ACM Computing Surveys (CSUR)
ADAPT: A framework for coscheduling multithreaded programs
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Reducing the energy cost of computing through efficient co-scheduling of parallel workloads
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
With the latest high-end computing nodes combining shared-memory multiprocessing with hardware multithreading, new scheduling policies are necessary for workloads consisting of multithreaded applications. The use of hybrid multiprocessors presents schedulers with the problem of job pairing, i.e. deciding which specific jobs can share each processor with minimum performance penalty, by running on different execution contexts. Therefore, scheduling policies are expected to decide not only which job mix will execute simultaneously across the processors, but also which jobs can be combined within each processor. This paper addresses the problem by introducing new scheduling policies that use run-time performance information to identify the best mix of threads to run across processors and within each processor. Scheduling of threads across processors is driven by the memory bandwidth utilization of the threads, whereas scheduling of threads within processors is driven by one of three metrics: bus transaction rate per thread, stall cycle rate per thread, or outermost level cache miss rate per thread. We have implemented and experimentally evaluated these policies on a real multiprocessor server with Intel Hyperthreaded processors. The policy using bus transaction rate for thread pairing achieves an average 13.4% and a maximum 28.7% performance improvement over the Linux scheduler. The policy using stall cycle rate for thread pairing achieves an average 9.5% and a maximum 18.8% performance improvement. The average and maximum performance gains of the policy using cache miss rate for thread pairing are 7.2% and 23.6% respectively.