Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Linux in a Nutshell (In a Nutshell (O'Reilly))
Linux in a Nutshell (In a Nutshell (O'Reilly))
ICAC '07 Proceedings of the Fourth International Conference on Autonomic Computing
Thread clustering: sharing-aware scheduling on SMP-CMP-SMT multiprocessors
Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
PowerNap: eliminating server idle power
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
SASP '08 Proceedings of the 2008 Symposium on Application Specific Processors
Proceedings of the 36th annual international symposium on Computer architecture
Thread motion: fine-grained power management for multi-core systems
Proceedings of the 36th annual international symposium on Computer architecture
Power Management of Datacenter Workloads Using Per-Core Power Gating
IEEE Computer Architecture Letters
IBM POWER7 multicore server processor
IBM Journal of Research and Development
Power optimization methodology for the IBM POWER7 microprocessor
IBM Journal of Research and Development
Adaptive energy-management features of the IBM POWER 7 chip
IBM Journal of Research and Development
A case for guarded power gating for multi-core processors
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
Power Token Balancing: Adapting CMPs to Power Constraints for Parallel Multithreaded Workloads
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Benchmarking modern multiprocessors
Benchmarking modern multiprocessors
Pack & Cap: adaptive DVFS and thread packing under power caps
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
In Simultaneous Multi-Threading (SMT) chip multiprocessors (CMPs), thread placement is performed today in a largely power-unaware manner. For example, consolidation of active threads into fewer cores exposes opportunities for power savings that have not been addressed in prior work. The savings opportunity is especially high in the emerging context where per-core power gating (PCPG) is becoming viable. The use of the optimum combination of core-wise SMT level and number of active cores to achieve a desired power-performance efficiency is a knob which has not been explored in prior work nor implemented as part of the operating system task scheduler. This work investigates the opportunities for such efficiency improvement in the context of the IBM POWER7 processor chip. We present a thread consolidation heuristic (TCH) capable of finding power-performance efficient thread placements at runtime, based on power-performance measurements. In the context of the PARSEC benchmark suite, chip power consumption is reduced by up to 21% (averaged across applications) when TCH is adopted instead of the default Linux thread scheduling policy, with minimal performance impact. TCH can create favorable conditions that enable aggressive actuation of PCPG, when that is available. In conjunction with PCPG, TCH can improve power-performance efficiency by a factor of up to 2.1 with respect to the default scheduler. We also evaluate TCH in the context of the SPECpower benchmark. In this case, TCH reduces system power up to 15% without PCPG and up to 22% with PCPG, with no performance degradation.