Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Automatic and interactive parallelization
Automatic and interactive parallelization
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Tolerating latency through software-controlled data prefetching
Tolerating latency through software-controlled data prefetching
Fusion of Loops for Parallelism and Locality
IEEE Transactions on Parallel and Distributed Systems
Data-centric multi-level blocking
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Region-based compilation: introduction, motivation, and initial experience
International Journal of Parallel Programming - Special issue on instruction-level parallel processing—part I
Tuning compiler optimizations for simultaneous multithreading
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Simplification of array access patterns for compiler optimizations
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
A Compiler Optimization Algorithm for Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Precise miss analysis for program transformations with caches of arbitrary associativity
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Synthesizing transformations for locality enhancement of imperfectly-nested loop nests
Proceedings of the 14th international conference on Supercomputing
Compiler analysis of irregular memory accesses
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Exact analysis of the cache behavior of nested loops
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Efficient and precise array access analysis
ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallel Programming with Polaris
Computer
On Estimating and Enhancing Cache Effectiveness
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Calculating stack distances efficiently
Proceedings of the 2002 workshop on Memory system performance
A comparison of empirical and model-driven optimization
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
FRONTIERS '99 Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation
Let's Study Whole-Program Cache Behaviour Analytically
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Automatically Tuned Linear Algebra Software
Automatically Tuned Linear Algebra Software
Software methods for improvement of cache performance on supercomputer applications
Software methods for improvement of cache performance on supercomputer applications
A comparison of empirical and model-driven optimization
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
High level cache simulation for heterogeneous multiprocessors
Proceedings of the 41st annual Design Automation Conference
A Geometric Programming Framework for Optimal Multi-Level Tiling
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Fast data-locality profiling of native execution
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
An analytical model for cache replacement policy performance
SIGMETRICS '06/Performance '06 Proceedings of the joint international conference on Measurement and modeling of computer systems
Feedback-directed memory disambiguation through store distance analysis
Proceedings of the 20th annual international conference on Supercomputing
Miss Rate Prediction Across Program Inputs and Cache Configurations
IEEE Transactions on Computers
Predicting locality phases for dynamic memory optimization
Journal of Parallel and Distributed Computing
A table-based method for single-pass cache optimization
Proceedings of the 18th ACM Great Lakes symposium on VLSI
Characterizing and modeling the behavior of context switch misses
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
P-OPT: Program-Directed Optimal Cache Management
Languages and Compilers for Parallel Computing
Exploiting stack distance to estimate worst-case data cache performance
Proceedings of the 2009 ACM symposium on Applied Computing
Program locality analysis using reuse distance
ACM Transactions on Programming Languages and Systems (TOPLAS)
Locality behavior of parallel and sequential algorithms for irregular graph problems
PDCS '07 Proceedings of the 19th IASTED International Conference on Parallel and Distributed Computing and Systems
Static reuse distances for locality-based optimizations in MATLAB
Proceedings of the 24th ACM International Conference on Supercomputing
Instruction-based reuse-distance prediction for effective cache management
SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
Understanding the behavior and implications of context switch misses
ACM Transactions on Architecture and Code Optimization (TACO)
Stack filter: Reducing L1 data cache power consumption
Journal of Systems Architecture: the EUROMICRO Journal
All-window profiling and composable models of cache sharing
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Practical loop transformations for tensor contraction expressions on multi-level memory hierarchies
CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
On the theory and potential of LRU-MRU collaborative cache management
Proceedings of the international symposium on Memory management
A work stealing scheduler for parallel loops on shared cache multicores
Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
A study on the locality behavior of minimum spanning tree algorithms
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Compile-Time thread distinguishment algorithm on VIM-Based architecture
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Working set characterization of applications with an efficient LRU algorithm
EPEW'06 Proceedings of the Third European conference on Formal Methods and Stochastic Models for Performance Evaluation
Phase-Based miss rate prediction across program inputs
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Path-Based reuse distance analysis
CC'06 Proceedings of the 15th international conference on Compiler Construction
Reuse distance based performance modeling and workload mapping
Proceedings of the 9th conference on Computing Frontiers
Providing fairness on shared-memory multiprocessors via process scheduling
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
A generalized theory of collaborative caching
Proceedings of the 2012 international symposium on Memory Management
Revisiting level-0 caches in embedded processors
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
HOTL: a higher order theory of locality
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
Cache behavior modeling is an important part of modern optimizing compilers. In this paper we present a method to estimate the number of cache misses, at compile time, using a machine independent model based on stack algorithms. Our algorithm computes the stack histograms symbolically, using data dependence distance vectors and is totally accurate when dependence distances are uniformly generated. The stack histogram models accurately fully associative caches with LRU replacement policy, and provides a very good approximation for set-associative caches and programs with non-constant dependence distances.The stack histogram is an accurate, machine-independent metric of locality. Compilers using this metric can evaluate optimizations with respect to memory behavior. We illustrate this use of the stack histogram by comparing three locality enhancing transformations: tiling, data shackling and the product-space transformation. Additionally, the stack histogram model can be used to compute optimal parameters for data locality transformations, such as the tile size for loop tiling.