Automatic translation of FORTRAN programs to vector form
ACM Transactions on Programming Languages and Systems (TOPLAS)
Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
On the problem of optimizing data transfers for complex memory systems
ICS '88 Proceedings of the 2nd international conference on Supercomputing
POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
MemSpy: analyzing memory system bottlenecks in programs
SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
A practical algorithm for exact array dependence analysis
Communications of the ACM
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Access normalization: loop restructuring for NUMA compilers
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Compiler blockability of numerical algorithms
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Efficient simulation of caches under optimal replacement with applications to miss characterization
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Counting solutions to Presburger formulas: how and why
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
MOB forms: a class of multilevel block algorithms for dense linear algebra operations
ICS '94 Proceedings of the 8th international conference on Supercomputing
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
SUIF: an infrastructure for research on parallelizing and optimizing compilers
ACM SIGPLAN Notices
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
A quantitative analysis of loop nest locality
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
ICS '96 Proceedings of the 10th international conference on Supercomputing
Data-centric multi-level blocking
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Data transformations for eliminating conflict misses
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Computer architecture (2nd ed.): a quantitative approach
Computer architecture (2nd ed.): a quantitative approach
Loop Transformations for Restructuring Compilers: The Foundations
Loop Transformations for Restructuring Compilers: The Foundations
On Estimating and Enhancing Cache Effectiveness
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
A compiler framework for restructuring data declarations to enhance cache and TLB effectiveness
CASCON '94 Proceedings of the 1994 conference of the Centre for Advanced Studies on Collaborative research
Aspects of cache memory and instruction buffer performance
Aspects of cache memory and instruction buffer performance
Software methods for improvement of cache performance on supercomputer applications
Software methods for improvement of cache performance on supercomputer applications
Cache conscious data layout organization for embedded multimedia applications
Proceedings of the conference on Design, automation and test in Europe
Tiling optimizations for 3D scientific computations
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Exploiting non-uniform reuse for cache optimization
Proceedings of the 2001 ACM symposium on Applied computing
Exact analysis of the cache behavior of nested loops
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Automatic Accurate Live Memory Analysis for Garbage-Collected Languages
OM '01 Proceedings of the 2001 ACM SIGPLAN workshop on Optimization of middleware and distributed systems
Performance prediction for random write reductions: a case study in modeling shared memory programs
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Tight bounds on cache use for stencil operations on rectangular grids
Journal of the ACM (JACM)
MIST: an algorithm for memory miss traffic management
Proceedings of the 2000 IEEE/ACM international conference on Computer-aided design
Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance
IEEE Transactions on Computers
Selecting Data Distributions for Unbounded Loops
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Optimized Live Heap Bound Analysis
VMCAI 2003 Proceedings of the 4th International Conference on Verification, Model Checking, and Abstract Interpretation
Improving Cache Effectiveness through Array Data Layout Manipulation in SAC
IFL '00 Selected Papers from the 12th International Workshop on Implementation of Functional Languages
A framework for performance modeling and prediction
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Performance optimizations and bounds for sparse matrix-vector multiply
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
METRIC: tracking down inefficiencies in the memory hierarchy via binary rewriting
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Predicting the impact of optimizations for embedded systems
Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Data cache locking for higher program predictability
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Data Caches in Multitasking Hard Real-Time Systems
RTSS '03 Proceedings of the 24th IEEE International Real-Time Systems Symposium
RTSS '03 Proceedings of the 24th IEEE International Real-Time Systems Symposium
Static analysis of parameterized loop nests for energy efficient use of data caches
Compilers and operating systems for low power
A Quantitative Analysis of Tile Size Selection Algorithms
The Journal of Supercomputing
A fast and accurate framework to analyze and optimize cache memory behavior
ACM Transactions on Programming Languages and Systems (TOPLAS)
Efficient and Accurate Analytical Modeling of Whole-Program Data Cache Behavior
IEEE Transactions on Computers
A compiler tool to predict memory hierarchy performance of scientific codes
Parallel Computing
High level cache simulation for heterogeneous multiprocessors
Proceedings of the 41st annual Design Automation Conference
Analytical computation of Ehrhart polynomials: enabling more compiler analyses and optimizations
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
IEEE Transactions on Computers
Line Size Adaptivity Analysis of Parameterized Loop Nests for Direct Mapped Data Cache
IEEE Transactions on Computers
A Model-Based Framework: An Approach for Profit-Driven Optimization
Proceedings of the international symposium on Code generation and optimization
A Geometric Programming Framework for Optimal Multi-Level Tiling
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Predicting Cache Space Contention in Utility Computing Servers
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
Fast data-locality profiling of native execution
SIGMETRICS '05 Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A non-uniform cache architecture for low power system design
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Statistical Models for Empirical Search-Based Performance Tuning
International Journal of High Performance Computing Applications
An accurate cost model for guiding data locality transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Finding optimal L1 cache configuration for embedded systems
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
A cache-defect-aware code placement algorithm for improving the performance of processors
ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Optimizing locality and scalability of embedded Runge--Kutta solvers using block-based pipelining
Journal of Parallel and Distributed Computing
Efficient synthesis of out-of-core algorithms using a nonlinear optimization solver
Journal of Parallel and Distributed Computing - Special issue: 18th International parallel and distributed processing symposium
Analytical modeling of codes with arbitrary data-dependent conditional structures
Journal of Systems Architecture: the EUROMICRO Journal
An approach toward profit-driven optimization
ACM Transactions on Architecture and Code Optimization (TACO)
Memory optimization by counting points in integer transformations of parametric polytopes
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
METRIC: Memory tracing via dynamic binary rewriting to identify cache inefficiencies
ACM Transactions on Programming Languages and Systems (TOPLAS)
Single-dimension software pipelining for multidimensional loops
ACM Transactions on Architecture and Code Optimization (TACO)
Fast, accurate design space exploration of embedded systems memory configurations
Proceedings of the 2007 ACM symposium on Applied computing
Miss Rate Prediction Across Program Inputs and Cache Configurations
IEEE Transactions on Computers
Characteristics of workloads used in high performance and technical computing
Proceedings of the 21st annual international conference on Supercomputing
Precise automatable analytical modeling of the cache behavior of codes with indirections
ACM Transactions on Architecture and Code Optimization (TACO)
Data cache locking for tight timing calculations
ACM Transactions on Embedded Computing Systems (TECS)
Positivity, posynomials and tile size selection
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Using Padding to Optimize Locality in Scientific Applications
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part I
Parametric heap usage analysis for functional programs
Proceedings of the 2009 international symposium on Memory management
Program locality analysis using reuse distance
ACM Transactions on Programming Languages and Systems (TOPLAS)
Abstract Interpretation of FIFO Replacement
SAS '09 Proceedings of the 16th International Symposium on Static Analysis
SuSeSim: a fast simulation strategy to find optimal L1 cache configuration for embedded systems
CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Automating the generation of composed linear algebra kernels
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Cache behavior modelling for codes involving banded matrices
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Code and Data Placement for Embedded Processors with Scratchpad and Cache Memories
Journal of Signal Processing Systems
Proceedings of the 47th Design Automation Conference
DEW: a fast level 1 cache simulation approach for embedded processors with FIFO replacement policy
Proceedings of the Conference on Design, Automation and Test in Europe
Tightening the bounds on feasible preemptions
ACM Transactions on Embedded Computing Systems (TECS)
Parallel memory prediction for fused linear algebra kernels
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
HC-Sim: a fast and exact l1 cache simulator with scratchpad memory co-simulation support
CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Combining measures for temporal and spatial locality
ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Working set characterization of applications with an efficient LRU algorithm
EPEW'06 Proceedings of the Third European conference on Formal Methods and Stochastic Models for Performance Evaluation
Tuning blocked array layouts to exploit memory hierarchy in SMT architectures
PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
CIPARSim: cache intersection property assisted rapid single-pass FIFO cache simulation technique
Proceedings of the International Conference on Computer-Aided Design
Experiences with enumeration of integer projections of parametric polytopes
CC'05 Proceedings of the 14th international conference on Compiler Construction
Phase-Based miss rate prediction across program inputs
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Near-optimal padding for removing conflict misses
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Integer affine transformations of parametric ℤ-polytopes and applications to loop nest optimization
ACM Transactions on Architecture and Code Optimization (TACO)
Analytical bounds for optimal tile size selection
CC'12 Proceedings of the 21st international conference on Compiler Construction
Static analysis of the worst-case memory performance for irregular codes with indirections
ACM Transactions on Architecture and Code Optimization (TACO)
Locality optimized shared-memory implementations of iterated runge-kutta methods
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
A survey on cache tuning from a power/energy perspective
ACM Computing Surveys (CSUR)
Address independent estimation of the boundaries of cache performance
Microprocessors & Microsystems
Hi-index | 0.02 |
With the ever-widening performance gap between processors and main memory, cache memory, which is used to bridge this gap, is becoming more and more significant. Caches work well for programs that exhibit sufficient locality. Other programs, however, have reference patterns that fail to exploit the cache, thereby suffering heavily from high memory latency. In order to get high cache efficiency and achieve good program performance, efficient memory accessing behavior is necessary. In fact, for many programs, program transformations or source-code changes can radically alter memory access patterns, significantly improving cache performance. Both hand-tuning and compiler optimization techniques are often used to transform codes to improve cache utilization. Unfortunately, cache conflicts are difficult to predict and estimate, precluding effective transformations. Hence, effective transformations require detailed knowledge about the frequency and causes of cache misses in the code. This article describes methods for generating and solving Cache Miss Equations (CMEs) that give a detailed representation of cache behavior, including conflict misses, in loop-oriented scientific code. Implemented within the SUIF compiler framework, our approach extends traditional compiler reuse analysis to generate linear Diophantine equations that summarize each loop's memory behavior. While solving these equations is in general difficult, we show that is also unnecessary, as mathematical techniques for manipulating Diophantine equations allow us to relatively easily compute and/or reduce the number of possible solutions, where each solution corresponds to a potential cache miss. The mathematical precision of CMEs allows us to find true optimal solutions for transformations such as blocking or padding. The generality of CMEs also allows us to reason about interactions between transformations applied in concert. The article also gives examples of their use to determine array padding and offset amounts that minimize cache misses, and to determine optimal blocking factors for tiled code. Overall, these equations represent an analysis framework that offers the generality and precision needed for detailed compiler optimizations.