Dynamic voltage and frequency scaling based on workload decomposition
Proceedings of the 2004 international symposium on Low power electronics and design
A Power-Aware Run-Time System for High-Performance Computing
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Memory-Aware Dynamic Voltage and Frequency Prediction for Portable Devices
RTCSA '08 Proceedings of the 2008 14th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications
Prediction-Based Power-Performance Adaptation of Multithreaded Scientific Codes
IEEE Transactions on Parallel and Distributed Systems
Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System
PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Comparing cache architectures and coherency protocols on x86-64 multicore SMP systems
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Dynamic voltage and frequency scaling: the laws of diminishing returns
HotPower'10 Proceedings of the 2010 international conference on Power aware computing and systems
Green governors: A framework for Continuously Adaptive DVFS
IGCC '11 Proceedings of the 2011 International Green Computing Conference and Workshops
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A framework for hybrid parallel flow simulations with a trillion cells in complex geometries
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Reducing CPU frequency and voltage is a well-known approach to reduce the energy consumption of memory-bound applications. This is based on the conception that main memory performance sees little or no degradation at reduced processor clock speeds, while power consumption decreases significantly. We study this effect in detail on the latest generation of x86-64 compute nodes. Our results show that memory and last level cache bandwidths at reduced clock speeds strongly depend on the processor microarchitecture. For example, while an Intel Westmere-EP processor achieves 95% of the peak main memory bandwidth at the lowest processor frequency, the bandwidth decreases to only 60% on the latest Sandy Bridge-EP platform. Increased efficiency of memory-bound applications may also be achieved with concurrency throttling, i.e. reducing the number of active cores per socket. We therefore complete our study with a detailed analysis of memory bandwidth scaling at different concurrency levels on our test systems. Our results-both qualitative developments and absolute bandwidth numbers-are valuable for scientists in the areas of computer architecture, performance and power analysis and modeling as well as application developers seeking to optimize their codes on current x86-64 systems.