On the Effects of Memory Latency and Bandwidth on Supercomputer Application Performance

Authors:
Richard Murphy
Affiliations:
Sandia National Laboratories, PO Box 5800, MS-1110, Albuquerque, NM 87185-1110. rcmurph@sandia.gov
Venue:
IISWC '07 Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization
Year:
2007

Citing 0
Cited 8

A platform for developing adaptable multicore applications

CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
The International Exascale Software Project: a Call To Cooperative Action By the Global High-Performance Community

International Journal of High Performance Computing Applications
Towards optimizing energy costs of algorithms for shared memory architectures

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
FAWNdamentally power-efficient clusters

HotOS'09 Proceedings of the 12th conference on Hot topics in operating systems
Cache injection for parallel applications

Proceedings of the 20th international symposium on High performance distributed computing
Hardware-software coherence protocol for the coexistence of caches and local memories

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
On the Path to Exascale

International Journal of Distributed Systems and Technologies
Assessing the effects of data compression in simulations using physically motivated metrics

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.01

Visualization

Abstract

Since the first vector supercomputers in the mid-1970's, the largest scale applications have traditionally been floating point oriented numerical codes, which can be broadly characterized as the simulation of physics on a computer. Supercomputer architectures have evolved to meet the needs of those applications. Specifically, the computational work of the application tends to be floating point oriented, and the decomposition of the problem two or three dimensional. Today, an emerging class of critical applications may change those assumptions: they are combinatorial in nature, integer oriented, and irregular. The performance of both classes of applications is dominated by the performance of the memory system. This paper compares the memory performance sensitivity of both traditional and emerging HPC applications, and shows that the new codes are significantly more sensitive to memory latency and bandwidth than their traditional counterparts. Additionally, these codes exhibit lower base-line performance, which only exacerbates the problem. As a result, the construction of future supercomputer architectures to support these applications will most likely be different from those used to support traditional codes. Quantitatively understanding the difference between the two workloads will form the basis for future design choices.