Memory performance at reduced CPU clock speeds: an analysis of current x86_64 processors

Authors:
Robert Schöne;Daniel Hackenberg;Daniel Molka
Affiliations:
Center for Information Services and High Performance Computing, Technische Universität Dresden, Dresden, Germany;Center for Information Services and High Performance Computing, Technische Universität Dresden, Dresden, Germany;Center for Information Services and High Performance Computing, Technische Universität Dresden, Dresden, Germany
Venue:
HotPower'12 Proceedings of the 2012 USENIX conference on Power-Aware Computing and Systems
Year:
2012

Citing 8
Cited 2

Dynamic voltage and frequency scaling based on workload decomposition

Proceedings of the 2004 international symposium on Low power electronics and design
A Power-Aware Run-Time System for High-Performance Computing

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Memory-Aware Dynamic Voltage and Frequency Prediction for Portable Devices

RTCSA '08 Proceedings of the 2008 14th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications
Prediction-Based Power-Performance Adaptation of Multithreaded Scientific Codes

IEEE Transactions on Parallel and Distributed Systems
Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Comparing cache architectures and coherency protocols on x86-64 multicore SMP systems

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Dynamic voltage and frequency scaling: the laws of diminishing returns

HotPower'10 Proceedings of the 2010 international conference on Power aware computing and systems
Green governors: A framework for Continuously Adaptive DVFS

IGCC '11 Proceedings of the 2011 International Green Computing Conference and Workshops

An early performance evaluation of many integrated core architecture based SGI rackable computing system

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A framework for hybrid parallel flow simulations with a trillion cells in complex geometries

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Reducing CPU frequency and voltage is a well-known approach to reduce the energy consumption of memory-bound applications. This is based on the conception that main memory performance sees little or no degradation at reduced processor clock speeds, while power consumption decreases significantly. We study this effect in detail on the latest generation of x86-64 compute nodes. Our results show that memory and last level cache bandwidths at reduced clock speeds strongly depend on the processor microarchitecture. For example, while an Intel Westmere-EP processor achieves 95% of the peak main memory bandwidth at the lowest processor frequency, the bandwidth decreases to only 60% on the latest Sandy Bridge-EP platform. Increased efficiency of memory-bound applications may also be achieved with concurrency throttling, i.e. reducing the number of active cores per socket. We therefore complete our study with a detailed analysis of memory bandwidth scaling at different concurrency levels on our test systems. Our results-both qualitative developments and absolute bandwidth numbers-are valuable for scientists in the areas of computer architecture, performance and power analysis and modeling as well as application developers seeking to optimize their codes on current x86-64 systems.