Working sets, cache sizes, and node granularity issues for large-scale multiprocessors
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Characterization of alpha AXP performance using TP and SPEC workloads
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Contrasting characteristics and cache performance of technical and multi-user commercial workloads
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Fast parallel algorithms for short-range molecular dynamics
Journal of Computational Physics
Analysis of benchmark characteristics and benchmark performance prediction
ACM Transactions on Computer Systems (TOCS)
Cache miss equations: an analytical representation of cache misses
ICS '97 Proceedings of the 11th international conference on Supercomputing
Computer organization and design (2nd ed.): the hardware/software interface
Computer organization and design (2nd ed.): the hardware/software interface
Memory system characterization of commercial workloads
Proceedings of the 25th annual international symposium on Computer architecture
Performance characterization of a Quad Pentium Pro SMP using OLTP workloads
Proceedings of the 25th annual international symposium on Computer architecture
Execution characteristics of desktop applications on Windows NT
Proceedings of the 25th annual international symposium on Computer architecture
An analysis of database workload performance on simultaneous multithreaded processors
Proceedings of the 25th annual international symposium on Computer architecture
Multilevel k-way partitioning scheme for irregular graphs
Journal of Parallel and Distributed Computing
A note on the calculation of average working set size
Communications of the ACM
Empirical working set behavior
Communications of the ACM
Chaff: engineering an efficient SAT solver
Proceedings of the 38th annual Design Automation Conference
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
An empirical performance evaluation of scalable scientific applications
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A generative model of working set dynamics
SIGMETRICS '81 Proceedings of the 1981 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The working set model for program behavior
SOSP '67 Proceedings of the first ACM symposium on Operating System Principles
The Use and Abuse of SPEC: An ISCA Panel
IEEE Micro
Execution characteristics of SPEC CPU2000 benchmarks: Intel C++ vs. Microsoft VC++
ACM-SE 42 Proceedings of the 42nd annual Southeast regional conference
Characterizing a new class of threads in scientific applications for high end supercomputers
Proceedings of the 18th annual international conference on Supercomputing
Scientific Computations on Modern Parallel Vector Systems
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
The implications of working set analysis on supercomputing memory hierarchy design
Proceedings of the 19th annual international conference on Supercomputing
Quantifying Locality In The Memory Access Patterns of HPC Applications
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Traveling threads: a new multithreaded execution model
Traveling threads: a new multithreaded execution model
A component model of spatial locality
Proceedings of the 2009 international symposium on Memory management
Application Information Services for distributed computing environments
Future Generation Computer Systems
Cache injection for parallel applications
Proceedings of the 20th international symposium on High performance distributed computing
Adaptive granularity memory systems: a tradeoff between storage efficiency and throughput
Proceedings of the 38th annual international symposium on Computer architecture
PSnAP: accurate synthetic address streams through memory profiles
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
The dynamic granularity memory system
Proceedings of the 39th Annual International Symposium on Computer Architecture
Hardware-software coherence protocol for the coexistence of caches and local memories
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
International Journal of Distributed Systems and Technologies
Computer performance analysis and the Pi Theorem
Computer Science - Research and Development
Hi-index | 14.98 |
This paper compares the System Performance Evaluation Cooperative (SPEC) Integer and Floating-Point suites to a set of real-world applications for high-performance computing at Sandia National Laboratories. These applications focus on the high-end scientific and engineering domains; however, the techniques presented in this paper are applicable to any application domain. The applications are compared in terms of three memory properties: 1) temporal locality (or reuse over time), 2) spatial locality (or the use of data "near” data that has already been accessed), and 3) data intensiveness (or the number of unique bytes the application accesses). The results show that real-world applications exhibit significantly less spatial locality, often exhibit less temporal locality, and have much larger data sets than the SPEC benchmark suite. They further quantitatively demonstrate the memory properties of real supercomputing applications.