The implications of working set analysis on supercomputing memory hierarchy design

Authors:
Richard Murphy;Arun Rodrigues;Peter Kogge;Keith Underwood
Affiliations:
University of Notre Dame, Notre Dame, IN;University of Notre Dame, Notre Dame, IN;University of Notre Dame, Notre Dame, IN;Sandia National Lab, Albuquerque, NM
Venue:
Proceedings of the 19th annual international conference on Supercomputing
Year:
2005

Citing 23
Cited 7

Working sets, cache sizes, and node granularity issues for large-scale multiprocessors

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Characterization of alpha AXP performance using TP and SPEC workloads

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Contrasting characteristics and cache performance of technical and multi-user commercial workloads

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Fast parallel algorithms for short-range molecular dynamics

Journal of Computational Physics
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Cache miss equations: an analytical representation of cache misses

ICS '97 Proceedings of the 11th international conference on Supercomputing
Memory system characterization of commercial workloads

Proceedings of the 25th annual international symposium on Computer architecture
Performance characterization of a Quad Pentium Pro SMP using OLTP workloads

Proceedings of the 25th annual international symposium on Computer architecture
Execution characteristics of desktop applications on Windows NT

Proceedings of the 25th annual international symposium on Computer architecture
An analysis of database workload performance on simultaneous multithreaded processors

Proceedings of the 25th annual international symposium on Computer architecture
An analytical model of the working-set sizes in decision-support systems

Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A note on the calculation of average working set size

Communications of the ACM
Empirical working set behavior

Communications of the ACM
Performance characteristics of the SPEC OMP2001 benchmarks

ACM SIGARCH Computer Architecture News - Special Issue: PACT 2001 workshops
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A generative model of working set dynamics

SIGMETRICS '81 Proceedings of the 1981 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The working set model for program behavior

SOSP '67 Proceedings of the first ACM symposium on Operating System Principles
Cache Performance of the SPEC Benchmark Suite

Cache Performance of the SPEC Benchmark Suite
Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements

IEEE Transactions on Computers
The Use and Abuse of SPEC: An ISCA Panel

IEEE Micro
Reflections on the memory wall

Proceedings of the 1st conference on Computing frontiers
Execution characteristics of SPEC CPU2000 benchmarks: Intel C++ vs. Microsoft VC++

ACM-SE 42 Proceedings of the 42nd annual Southeast regional conference
Characterizing a new class of threads in scientific applications for high end supercomputers

Proceedings of the 18th annual international conference on Supercomputing

On the Memory Access Patterns of Supercomputer Applications: Benchmark Selection and Its Implications

IEEE Transactions on Computers
Efficiency and scalability of barrier synchronization on NoC based many-core architectures

CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Models for generating locality-tuned traveling threads for a hierarchical multi-level heterogeneous multicore

Proceedings of the 7th ACM international conference on Computing frontiers
Let there be light!: the future of memory systems is photonics and 3D stacking

Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Exploring latency-power tradeoffs in deep nonvolatile memory hierarchies

Proceedings of the 9th conference on Computing Frontiers
On the Path to Exascale

International Journal of Distributed Systems and Technologies
Energy-efficient multithreading for a hierarchical heterogeneous multicore through locality-cognizant thread generation

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Supercomputer architects strive to maximize the performance of scientific applications. Unfortunately, the large, unwieldy nature of most scientific applications has lead to the creation of artificial benchmarks, such as SPEC-FP, for architecture research. Given the impact that these benchmarks have on architecture research, this paper seeks an understanding of how they relate to real-world applications within the Department of Energy. Since the memory system has been found to be a particularly key issue for many applications, the focus of the paper is on the relationship between how the SPEC-FP benchmarks and DOE applications use the memory system. The results indicate that while the SPEC-FP suite is a well balanced suite, supercomputing applications typically demand more from the memory system and must perform more "other work" (in the form of integer computations) along with the floating point operations. The SPEC-FP suite generally demonstrates slightly more temporal locality leading to somewhat lower bandwidth demands. The most striking result is the cumulative difference between the benchmarks and the applications in terms of the requirements to sustain the floating-point operation rate: the DOE applications require significantly more data from main memory (not cache) per FLOP and dramatically more integer instructions per FLOP.