Theory of linear and integer programming
Theory of linear and integer programming
Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Counting solutions to Presburger formulas: how and why
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
A modified approach to data cache management
Proceedings of the 28th annual international symposium on Microarchitecture
Cache miss heuristics and preloading techniques for general-purpose programs
Proceedings of the 28th annual international symposium on Microarchitecture
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
ICS '96 Proceedings of the 10th international conference on Supercomputing
Data-centric multi-level blocking
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Data transformations for eliminating conflict misses
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
The Fortran parallel transformer and its programming environment
Information Sciences: an International Journal - special issue on parallel and distributed processing
Parametric Analysis of Polyhedral Iteration Spaces
Journal of VLSI Signal Processing Systems - Special issue on application specific systems, architectures and processors
Cache-conscious structure layout
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
A locality sensitive multi-module cache with explicit management
ICS '99 Proceedings of the 13th international conference on Supercomputing
Nonlinear array layouts for hierarchical memory systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
An integer linear programming approach for optimizing cache locality
ICS '99 Proceedings of the 13th international conference on Supercomputing
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Synthesizing transformations for locality enhancement of imperfectly-nested loop nests
Proceedings of the 14th international conference on Supercomputing
ACM Computing Surveys (CSUR)
Exact analysis of the cache behavior of nested loops
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Cache performance for selected SPEC CPU2000 benchmarks
ACM SIGARCH Computer Architecture News
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
MIST: an algorithm for memory miss traffic management
Proceedings of the 2000 IEEE/ACM international conference on Computer-aided design
Software-assisted cache replacement mechanisms for embedded systems
Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Maximizing Loop Parallelism and Improving Data Locality via Loop Fusion and Distribution
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Using the Compiler to Improve Cache Replacement Decisions
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
PARLE '93 Proceedings of the 5th International PARLE Conference on Parallel Architectures and Languages Europe
Automatic Parallelization in the Polytope Model
The Data Parallel Programming Model: Foundations, HPF Realization, and Scientific Applications
Predicting whole-program locality through reuse distance analysis
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Fast, Accurate and Flexible Data Locality Analysis
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
A Matrix-Based Approach to the Global Locality Optimization Problem
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Compiler-Directed Resource Management for Active Code Regions
INTERACT '03 Proceedings of the Seventh Workshop on Interaction between Compilers and Computer Architectures
Let's Study Whole-Program Cache Behaviour Analytically
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Cache miss equations: compiler analysis framework for tuning memory behavior
Cache miss equations: compiler analysis framework for tuning memory behavior
Compile-time performance prediction of scientific programs
Compile-time performance prediction of scientific programs
Using Hammock Graphs to Structure Programs
IEEE Transactions on Software Engineering
A compiler tool to predict memory hierarchy performance of scientific codes
Parallel Computing
Analytical computation of Ehrhart polynomials: enabling more compiler analyses and optimizations
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Reuse-distance-based miss-rate prediction on a per instruction basis
MSP '04 Proceedings of the 2004 workshop on Memory system performance
Intermediately executed code is the key to find refactorings that improve temporal data locality
Proceedings of the 3rd conference on Computing frontiers
Locality approximation using time
Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Predicting locality phases for dynamic memory optimization
Journal of Parallel and Distributed Computing
pn: a tool for improved derivation of process networks
EURASIP Journal on Embedded Systems
A low power front-end for embedded processors using a block-aware instruction set
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
P-OPT: Program-Directed Optimal Cache Management
Languages and Compilers for Parallel Computing
Finding and Applying Loop Transformations for Generating Optimized FPGA Implementations
Transactions on High-Performance Embedded Architectures and Compilers I
Program locality analysis using reuse distance
ACM Transactions on Programming Languages and Systems (TOPLAS)
Instruction cache locking inside a binary rewriter
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Accelerating multicore reuse distance analysis with sampling and parallelization
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
All-window profiling and composable models of cache sharing
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Proceedings of the 2011 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Minor memory references matter in collaborative caching
Proceedings of the 2011 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
On the theory and potential of LRU-MRU collaborative cache management
Proceedings of the international symposium on Memory management
Dynamic access distance driven cache replacement
ACM Transactions on Architecture and Code Optimization (TACO)
A study of the performance potential for dynamic instruction hints selection
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Discovery of locality-improving refactorings by reuse path analysis
HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
Path-Based reuse distance analysis
CC'06 Proceedings of the 15th international conference on Compiler Construction
Automated locality optimization based on the reuse distance of string operations
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
A generalized theory of collaborative caching
Proceedings of the 2012 international symposium on Memory Management
HOTL: a higher order theory of locality
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Pacman: program-assisted cache management
Proceedings of the 2013 international symposium on memory management
Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Hi-index | 0.00 |
One of the new extensions in EPIC architectures are cache hints. On each memory instruction, two kinds of hints can be attached: a source cache hint and a target cache hint. The source hint indicates the true latency of the instruction, which is used by the compiler to improve the instruction schedule. The target hint indicates at which cache levels it is profitable to retain data, allowing to improve cache replacement decisions at run time. A compile-time method is presented which calculates appropriate cache hints. Both kind of hints are based on the locality of the instruction, measured by the reuse distance metric.Two alternative methods are discussed. The first one profiles the reuse distance distribution, and selects a static hint for each instruction. The second method calculates the reuse distance analytically, which allows to generate dynamic hints, i.e. the best hint for each memory access is calculated at run-time.The implementation of the static hints scheme in the Open64-compiler for the Itanium processor shows a speedup of 10% on average on a set of pointer-intensive and regular loop-based programs. The analytical approach with dynamic hints was implemented in the FPT-compiler and shows up to 34% reduction in cache misses.