Fast data-locality profiling of native execution
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
An analytical model for cache replacement policy performance
SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Multigrid and Gauss-Seidel smoothers revisited: parallelization on chip multiprocessors
Proceedings of the 20th annual international conference on Supercomputing
CMP cache performance projection: accessibility vs. capacity
ACM SIGARCH Computer Architecture News
A one-shot configurable-cache tuner for improved energy and performance
Proceedings of the conference on Design, automation and test in Europe
A table-based method for single-pass cache optimization
Proceedings of the 18th ACM Great Lakes symposium on VLSI
Accurate memory signatures and synthetic address traces for HPC applications
Proceedings of the 22nd annual international conference on Supercomputing
Cache modeling in probabilistic execution time analysis
Proceedings of the 45th annual Design Automation Conference
Static analysis for fast and accurate design space exploration of caches
CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
HASS: a scheduler for heterogeneous multicore systems
ACM SIGOPS Operating Systems Review
Program locality analysis using reuse distance
ACM Transactions on Programming Languages and Systems (TOPLAS)
Maximizing power efficiency with asymmetric multicore systems
Communications of the ACM - Finding the Fun in Computer Science Education
Managing contention for shared resources on multicore processors
Communications of the ACM
Reconsidering algorithms for iterative solvers in the multicore era
International Journal of Computational Science and Engineering
Managing Contention for Shared Resources on Multicore Processors
Queue - Power Management
Addressing shared resource contention in multicore processors via scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Accelerating multicore reuse distance analysis with sampling and parallelization
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Improved procedure placement for set associative caches
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Journal of Parallel and Distributed Computing
All-window profiling and composable models of cache sharing
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Fast modeling of shared caches in multicore systems
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
HC-Sim: a fast and exact l1 cache simulator with scratchpad memory co-simulation support
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
PSnAP: accurate synthetic address streams through memory profiles
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Phase guided profiling for fast cache modeling
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Survey of scheduling techniques for addressing shared resources in multicore processors
ACM Computing Surveys (CSUR)
Hi-index | 0.02 |
The widening memory gap reduces performance of applications with poor data locality. Therefore, there is a need for methods to analyze data locality and help application optimization. In this paper we present StatCache, a novel sampling-based method for performing data-locality analysis on realistic workloads. StatCache is based on a probabilistic model of the cache, rather than a functional cache simulator. It uses statistics from a single run to accurately estimate miss ratios of fully-associative caches of arbitrary sizes and generate working-set graphs. We evaluate StatCache using the SPEC CPU2000 benchmarks and show that StatCache gives accurate results with a sampling rate as low as 10/sup -4/. We also provide a proof-of-concept implementation, and discuss potentially very fast implementation alternatives.