Self-adjusting binary search trees
Journal of the ACM (JACM)
Estimating interlock and improving balance for pipelined architectures
Journal of Parallel and Distributed Computing
Implementing stack simulation for highly-associative memories
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A static performance estimator to guide data partitioning decisions
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Predicting program behavior using real or estimated profiles
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
ATOM: a system for building customized program analysis tools
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Improving the ratio of memory operations to floating-point operations in loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
An inter-reference gap model for temporal locality in program behavior
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Array SSA form and its use in parallelization
POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Quantifying loop nest locality using SPEC'95 and the perfect benchmarks
ACM Transactions on Computer Systems (TOCS)
Tools for application-oriented performance tuning
ICS '01 Proceedings of the 15th international conference on Supercomputing
A framework for reducing the cost of instrumented code
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Efficient representations and abstractions for quantifying and exploiting data reference locality
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Workload characterization of emerging computer applications
A compiler approach to fast hardware design space exploration in FPGA-based systems
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Dynamic hot data stream prefetching for general-purpose programs
PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
International Journal of Parallel Programming
On the Stability of Temporal Data Reference Profiles
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Workload Design: Selecting Representative Program-Input Pairs
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Reuse Distance-Based Cache Hint Selection
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
The Multi-Queue Replacement Algorithm for Second Level Buffer Caches
Proceedings of the General Track: 2002 USENIX Annual Technical Conference
Compiler-directed run-time monitoring of program data access
Proceedings of the 2002 workshop on Memory system performance
Calculating stack distances efficiently
Proceedings of the 2002 workshop on Memory system performance
Compile-time composition of run-time data and iteration reorderings
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
On the Predictability of Program Behavior Using Different Input Data Sets
INTERACT '02 Proceedings of the Sixth Annual Workshop on Interaction between Compilers and Computer Architectures
Cache management by the compiler
Cache management by the compiler
Aspects of cache memory and instruction buffer performance
Aspects of cache memory and instruction buffer performance
Improving effective bandwidth through compiler enhancement of global and dynamic cache reuse
Improving effective bandwidth through compiler enhancement of global and dynamic cache reuse
Compile-time performance prediction of scientific programs
Compile-time performance prediction of scientific programs
Improving effective bandwidth through compiler enhancement of global cache reuse
Journal of Parallel and Distributed Computing
Array regrouping and structure splitting using whole-program reference affinity
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
EXPERT: expedited simulation exploiting program behavior repetition
Proceedings of the 18th annual international conference on Supercomputing
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
The Potential of Computation Regrouping for Improving Locality
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Metrics and models for reordering transformations
MSP '04 Proceedings of the 2004 workshop on Memory system performance
Reuse-distance-based miss-rate prediction on a per instruction basis
MSP '04 Proceedings of the 2004 workshop on Memory system performance
Memory access analysis and optimization approaches on splay trees
LCR '04 Proceedings of the 7th workshop on Workshop on languages, compilers, and run-time support for scalable systems
IEEE Transactions on Computers
Generating cache hints for improved program efficiency
Journal of Systems Architecture: the EUROMICRO Journal
Instruction Based Memory Distance Analysis and its Application
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Multiple Page Size Modeling and Optimization
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Quantifying Locality In The Memory Access Patterns of HPC Applications
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
A hierarchical model of data locality
Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Spectral prefetcher: An effective mechanism for L2 cache prefetching
ACM Transactions on Architecture and Code Optimization (TACO)
Exploiting Locality for Irregular Scientific Codes
IEEE Transactions on Parallel and Distributed Systems
HeapMD: identifying heap-based bugs using anomaly detection
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Feedback-directed memory disambiguation through store distance analysis
Proceedings of the 20th annual international conference on Supercomputing
Locality approximation using time
Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
METRIC: Memory tracing via dynamic binary rewriting to identify cache inefficiencies
ACM Transactions on Programming Languages and Systems (TOPLAS)
CLOCK-Pro: an effective improvement of the CLOCK replacement
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Miss Rate Prediction Across Program Inputs and Cache Configurations
IEEE Transactions on Computers
Predicting locality phases for dynamic memory optimization
Journal of Parallel and Distributed Computing
Characteristics of workloads used in high performance and technical computing
Proceedings of the 21st annual international conference on Supercomputing
CRAMM: virtual memory support for garbage-collected applications
OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
All-window profiling of concurrent executions
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Cache-oblivious databases: Limitations and opportunities
ACM Transactions on Database Systems (TODS)
Accurate memory signatures and synthetic address traces for HPC applications
Proceedings of the 22nd annual international conference on Supercomputing
Sampling-based program locality approximation
Proceedings of the 7th international symposium on Memory management
Exploration of the Influence of Program Inputs on CMP Co-scheduling
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Modeling Relations between Inputs and Dynamic Behavior for General Programs
Languages and Compilers for Parallel Computing
Analysing and improving clustering based sampling for microprocessor simulation
International Journal of High Performance Computing and Networking
Scalable Implementation of Efficient Locality Approximation
Languages and Compilers for Parallel Computing
Influence of program inputs on the selection of garbage collectors
Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
HASS: a scheduler for heterogeneous multicore systems
ACM SIGOPS Operating Systems Review
A component model of spatial locality
Proceedings of the 2009 international symposium on Memory management
Virtual reuse distance analysis of SPECjvm2008 data locality
PPPJ '09 Proceedings of the 7th International Conference on Principles and Practice of Programming in Java
The study and handling of program inputs in the selection of garbage collectors
ACM SIGOPS Operating Systems Review
Contention aware execution: online contention detection and response
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
A graph theoretic approach to cache-conscious placement of data for direct mapped caches
Proceedings of the 2010 international symposium on Memory management
Data cache-energy and throughput models: design exploration for embedded processors
EURASIP Journal on Embedded Systems - Special issue on design and architectures for signal and image processing
Static reuse distances for locality-based optimizations in MATLAB
Proceedings of the 24th ACM International Conference on Supercomputing
Instruction-based reuse-distance prediction for effective cache management
SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
Instruction cache locking using temporal reuse profile
Proceedings of the 47th Design Automation Conference
Accelerating multicore reuse distance analysis with sampling and parallelization
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Improved procedure placement for set associative caches
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
An efficient simulation algorithm for cache of random replacement policy
NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Transparent runtime parallelization of the R scripting language
Journal of Parallel and Distributed Computing
A Predictive Model for Dynamic Microarchitectural Adaptivity Control
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Fast modeling of shared caches in multicore systems
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Automatic estimation of performance requirements for software tasks of mobile devices
Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering
Practical loop transformations for tensor contraction expressions on multi-level memory hierarchies
CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
How to fit program footprint curves
Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Predicting remote reuse distance patterns in UPC applications
Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model
Approximate graph clustering for program characterization
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Data-Layout optimization using reuse distance distribution
EUC'06 Proceedings of the 2006 international conference on Emerging Directions in Embedded and Ubiquitous Computing
Phase-Based miss rate prediction across program inputs
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
RDVIS: a tool that visualizes the causes of low locality and hints program optimizations
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
Trace-Based data layout optimizations for multi-core processors
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Combining locality analysis with online proactive job co-scheduling in chip multiprocessors
HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Is reuse distance applicable to data locality analysis on chip multiprocessors?
CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
Path-Based reuse distance analysis
CC'06 Proceedings of the 15th international conference on Compiler Construction
Affinity-aware DMA buffer management for reducing off-chip memory access
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Static and dynamic co-optimizations for blocks mapping in hybrid caches
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Exploiting inter-sequence correlations for program behavior prediction
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Efficient Reuse Distance Analysis of Multicore Scaling for Loop-Based Parallel Programs
ACM Transactions on Computer Systems (TOCS)
Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
A survey on cache tuning from a power/energy perspective
ACM Computing Surveys (CSUR)
Studying multicore processor scaling via reuse distance analysis
Proceedings of the 40th Annual International Symposium on Computer Architecture
Run-time reconfiguration of expandable cache for embedded systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
APE: accelerator processor extensions to optimize data-compute co-location
Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Location-aware cache management for many-core processors with deep cache hierarchy
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Toward application-specific memory reconfiguration for energy efficiency
E2SC '13 Proceedings of the 1st International Workshop on Energy Efficient Supercomputing
Imbalanced cache partitioning for balanced data-parallel programs
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Dynamic microarchitectural adaptation using machine learning
ACM Transactions on Architecture and Code Optimization (TACO)
C1C: A configurable, compiler-guided STT-RAM L1 cache
ACM Transactions on Architecture and Code Optimization (TACO)
Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential
ACM Transactions on Architecture and Code Optimization (TACO)
SPM-Sieve: a framework for assisting data partitioning in scratch pad memory based systems
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Integrating profile-driven parallelism detection and machine-learning-based mapping
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.01 |
Profiling can accurately analyze program behavior for select data inputs. We show that profiling can also predict program locality for inputs other than profiled ones. Here locality is defined by the distance of data reuse. Studying whole-program data reuse may reveal global patterns not apparent in short-distance reuses or local control flow. However, the analysis must meet two requirements to be useful. The first is efficiency. It needs to analyze all accesses to all data elements in full-size benchmarks and to measure distance of any length and in any required precision. The second is predication. Based on a few training runs, it needs to classify patterns as regular and irregular and, for regular ones, it should predict their (changing) behavior for other inputs. In this paper, we show that these goals are attainable through three techniques: approximate analysis of reuse distance (originally called LRU stack distance), pattern recognition, and distance-based sampling. When tested on 15 integer and floating-point programs from SPEC and other benchmark suites, our techniques predict with on average 94% accuracy for data inputs up to hundreds times larger than the training inputs. Based on these results, the paper discusses possible uses of this analysis.