Working sets, cache sizes, and node granularity issues for large-scale multiprocessors
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Characterization of alpha AXP performance using TP and SPEC workloads
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Contrasting characteristics and cache performance of technical and multi-user commercial workloads
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Fast parallel algorithms for short-range molecular dynamics
Journal of Computational Physics
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Cache miss equations: an analytical representation of cache misses
ICS '97 Proceedings of the 11th international conference on Supercomputing
Memory system characterization of commercial workloads
Proceedings of the 25th annual international symposium on Computer architecture
Performance characterization of a Quad Pentium Pro SMP using OLTP workloads
Proceedings of the 25th annual international symposium on Computer architecture
Execution characteristics of desktop applications on Windows NT
Proceedings of the 25th annual international symposium on Computer architecture
An analysis of database workload performance on simultaneous multithreaded processors
Proceedings of the 25th annual international symposium on Computer architecture
An analytical model of the working-set sizes in decision-support systems
Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A note on the calculation of average working set size
Communications of the ACM
Empirical working set behavior
Communications of the ACM
Performance characteristics of the SPEC OMP2001 benchmarks
ACM SIGARCH Computer Architecture News - Special Issue: PACT 2001 workshops
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
A generative model of working set dynamics
SIGMETRICS '81 Proceedings of the 1981 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The working set model for program behavior
SOSP '67 Proceedings of the first ACM symposium on Operating System Principles
Cache Performance of the SPEC Benchmark Suite
Cache Performance of the SPEC Benchmark Suite
Bottlenecks in Multimedia Processing with SIMD Style Extensions and Architectural Enhancements
IEEE Transactions on Computers
The Use and Abuse of SPEC: An ISCA Panel
IEEE Micro
Reflections on the memory wall
Proceedings of the 1st conference on Computing frontiers
Execution characteristics of SPEC CPU2000 benchmarks: Intel C++ vs. Microsoft VC++
ACM-SE 42 Proceedings of the 42nd annual Southeast regional conference
Characterizing a new class of threads in scientific applications for high end supercomputers
Proceedings of the 18th annual international conference on Supercomputing
IEEE Transactions on Computers
Efficiency and scalability of barrier synchronization on NoC based many-core architectures
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Proceedings of the 7th ACM international conference on Computing frontiers
Let there be light!: the future of memory systems is photonics and 3D stacking
Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Exploring latency-power tradeoffs in deep nonvolatile memory hierarchies
Proceedings of the 9th conference on Computing Frontiers
International Journal of Distributed Systems and Technologies
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Supercomputer architects strive to maximize the performance of scientific applications. Unfortunately, the large, unwieldy nature of most scientific applications has lead to the creation of artificial benchmarks, such as SPEC-FP, for architecture research. Given the impact that these benchmarks have on architecture research, this paper seeks an understanding of how they relate to real-world applications within the Department of Energy. Since the memory system has been found to be a particularly key issue for many applications, the focus of the paper is on the relationship between how the SPEC-FP benchmarks and DOE applications use the memory system. The results indicate that while the SPEC-FP suite is a well balanced suite, supercomputing applications typically demand more from the memory system and must perform more "other work" (in the form of integer computations) along with the floating point operations. The SPEC-FP suite generally demonstrates slightly more temporal locality leading to somewhat lower bandwidth demands. The most striking result is the cumulative difference between the benchmarks and the applications in terms of the requirements to sustain the floating-point operation rate: the DOE applications require significantly more data from main memory (not cache) per FLOP and dramatically more integer instructions per FLOP.