ACM Transactions on Computer Systems (TOCS)
ACM Transactions on Computer Systems (TOCS)
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Evaluating Associativity in CPU Caches
IEEE Transactions on Computers
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A model for estimating trace-sample miss ratios
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
MemSpy: analyzing memory system bottlenecks in programs
SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Page placement algorithms for large real-indexed caches
ACM Transactions on Computer Systems (TOCS)
A framework for unifying reordering transformations
A framework for unifying reordering transformations
Wisconsin Architectural Research Tool Set
ACM SIGARCH Computer Architecture News
Counting solutions to Presburger formulas: how and why
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
ATOM: a system for building customized program analysis tools
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Compiler optimizations for improving data locality
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Unifying data and control transformations for distributed shared-memory machines
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Influence of cross-interferences on blocked loops: a case study with matrix-vector multiply
ACM Transactions on Programming Languages and Systems (TOPLAS)
ICS '96 Proceedings of the 10th international conference on Supercomputing
Active memory: a new abstraction for memory system simulation
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Complexity and uniformity of elimination in Presburger arithmetic
ISSAC '97 Proceedings of the 1997 international symposium on Symbolic and algebraic computation
Cache miss equations: an analytical representation of cache misses
ICS '97 Proceedings of the 11th international conference on Supercomputing
Maximizing parallelism and minimizing synchronization with affine transforms
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Symbolic timing verification of timing diagrams using Presburger formulas
DAC '97 Proceedings of the 34th annual Design Automation Conference
Making complex timing relationships readable: Presburger formula simplicication using don't cares
DAC '98 Proceedings of the 35th annual Design Automation Conference
Data transformations for eliminating conflict misses
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Simplification of array access patterns for compiler optimizations
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Eliminating conflict misses for high performance architectures
ICS '98 Proceedings of the 12th international conference on Supercomputing
Precise miss analysis for program transformations with caches of arbitrary associativity
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
A Linear Algebra Framework for Automatic Determination of Optimal Data Layouts
IEEE Transactions on Parallel and Distributed Systems
Nonlinear array layouts for hierarchical memory systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
Recursive array layouts and fast parallel matrix multiplication
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Analytical Modeling of Set-Associative Cache Behavior
IEEE Transactions on Computers
Cache miss equations: a compiler framework for analyzing and tuning memory behavior
ACM Transactions on Programming Languages and Systems (TOPLAS)
Quantifying loop nest locality using SPEC'95 and the perfect benchmarks
ACM Transactions on Computer Systems (TOCS)
The combinatorics of cache misses during matrix multiplication
Journal of Computer and System Sciences - Special issue on Internet algorithms
Introduction To Automata Theory, Languages, And Computation
Introduction To Automata Theory, Languages, And Computation
Quantifying the Multi-level Nature of Tiling Interactions
LCPC '97 Proceedings of the 10th International Workshop on Languages and Compilers for Parallel Computing
An Automata-Theoretic Approach to Presburger Arithmetic Constraints (Extended Abstract)
SAS '95 Proceedings of the Second International Symposium on Static Analysis
Diophantine Equations, Presburger Arithmetic and Finite Automata
CAAP '96 Proceedings of the 21st International Colloquium on Trees in Algebra and Programming
Cache Behavior Prediction by Abstract Interpretation
SAS '96 Proceedings of the Third International Symposium on Static Analysis
Caches as Filters: A New Approach to Cache Analysis
MASCOTS '98 Proceedings of the 6th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems
Tiling Imperfectly-nested Loop Nests (REVISED)
Tiling Imperfectly-nested Loop Nests (REVISED)
Caches As Filters: A Unifying Model for Memory Hierarchy Analysis
Caches As Filters: A Unifying Model for Memory Hierarchy Analysis
Software methods for improvement of cache performance on supercomputer applications
Software methods for improvement of cache performance on supercomputer applications
Cache miss equations: compiler analysis framework for tuning memory behavior
Cache miss equations: compiler analysis framework for tuning memory behavior
Locality enhancement of imperfectly nested loop nests
Locality enhancement of imperfectly nested loop nests
Compile-time performance prediction of scientific programs
Compile-time performance prediction of scientific programs
An efficient profile-analysis framework for data-layout optimizations
POPL '02 Proceedings of the 29th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Performance prediction for random write reductions: a case study in modeling shared memory programs
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance
IEEE Transactions on Computers
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A framework for performance modeling and prediction
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Performance optimizations and bounds for sparse matrix-vector multiply
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
METRIC: tracking down inefficiencies in the memory hierarchy via binary rewriting
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Data cache locking for higher program predictability
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
QR factorization with Morton-ordered quadtree matrices for memory re-use and parallelism
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Estimating cache misses and locality using stack distances
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Estimating influence of data layout optimizations on SDRAM energy consumption
Proceedings of the 2003 international symposium on Low power electronics and design
A fast and accurate framework to analyze and optimize cache memory behavior
ACM Transactions on Programming Languages and Systems (TOPLAS)
Efficient and Accurate Analytical Modeling of Whole-Program Data Cache Behavior
IEEE Transactions on Computers
A compiler tool to predict memory hierarchy performance of scientific codes
Parallel Computing
Analytical computation of Ehrhart polynomials: enabling more compiler analyses and optimizations
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Compiler Estimation of Load Imbalance Overhead in Speculative Parallelization
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Line Size Adaptivity Analysis of Parameterized Loop Nests for Direct Mapped Data Cache
IEEE Transactions on Computers
Combining Models and Guided Empirical Search to Optimize for Multiple Levels of the Memory Hierarchy
Proceedings of the international symposium on Code generation and optimization
A Geometric Programming Framework for Optimal Multi-Level Tiling
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Optimizing aggregate array computations in loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Predicting Cache Space Contention in Utility Computing Servers
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
Reuse-distance-based miss-rate prediction on a per instruction basis
MSP '04 Proceedings of the 2004 workshop on Memory system performance
Generating cache hints for improved program efficiency
Journal of Systems Architecture: the EUROMICRO Journal
A methodology for detailed performance modeling of reduction computations on SMP machines
Performance Evaluation - Performance modelling and evaluation of high-performance parallel and distributed systems
Statistical Models for Empirical Search-Based Performance Tuning
International Journal of High Performance Computing Applications
Instruction Based Memory Distance Analysis and its Application
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Analytical modeling of codes with arbitrary data-dependent conditional structures
Journal of Systems Architecture: the EUROMICRO Journal
Empirical optimization for a sparse linear solver: a case study
International Journal of Parallel Programming - Special issue: The next generation software program
METRIC: Memory tracing via dynamic binary rewriting to identify cache inefficiencies
ACM Transactions on Programming Languages and Systems (TOPLAS)
A compiler cost model for speculative parallelization
ACM Transactions on Architecture and Code Optimization (TACO)
Miss Rate Prediction Across Program Inputs and Cache Configurations
IEEE Transactions on Computers
Precise automatable analytical modeling of the cache behavior of codes with indirections
ACM Transactions on Architecture and Code Optimization (TACO)
WCET estimation for executables in the presence of data caches
EMSOFT '07 Proceedings of the 7th ACM & IEEE international conference on Embedded software
Cache-aware iteration space partitioning
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Relative competitive analysis of cache replacement policies
Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
Positivity, posynomials and tile size selection
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Cache-aware partitioning of multi-dimensional iteration spaces
SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
Program locality analysis using reuse distance
ACM Transactions on Programming Languages and Systems (TOPLAS)
Abstract Interpretation of FIFO Replacement
SAS '09 Proceedings of the 16th International Symposium on Static Analysis
Optimizing shared cache behavior of chip multiprocessors
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Cache vulnerability equations for protecting data in embedded processor caches from soft errors
Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
Cache behavior modelling for codes involving banded matrices
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Performance modeling for dynamic algorithm selection
ICCS'03 Proceedings of the 2003 international conference on Computational science
Static analysis to mitigate soft errors in register files
Proceedings of the Conference on Design, Automation and Test in Europe
Tightening the bounds on feasible preemptions
ACM Transactions on Embedded Computing Systems (TECS)
Static timing analysis for hard real-time systems
VMCAI'10 Proceedings of the 11th international conference on Verification, Model Checking, and Abstract Interpretation
Experiences with enumeration of integer projections of parametric polytopes
CC'05 Proceedings of the 14th international conference on Compiler Construction
Phase-Based miss rate prediction across program inputs
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
A code isolator: isolating code fragments from large programs
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
PICA: Processor Idle Cycle Aggregation for Energy-Efficient Embedded Systems
ACM Transactions on Embedded Computing Systems (TECS)
A survey on cache tuning from a power/energy perspective
ACM Computing Surveys (CSUR)
Fast cache simulation for host-compiled simulation of embedded software
Proceedings of the Conference on Design, Automation and Test in Europe
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.01 |
We develop from first principles an exact model of the behavior of loop nests executing in a memory hicrarchy, by using a nontraditional classification of misses that has the key property of composability. We use Presburger formulas to express various kinds of misses as well as the state of the cache at the end of the loop nest. We use existing tools to simplify these formulas and to count cache misses. The model is powerful enough to handle imperfect loop nests and various flavors of non-linear array layouts based on bit interleaving of array indices. We also indicate how to handle modest levels of associativity, and how to perform limited symbolic analysis of cache behavior. The complexity of the formulas relates to the static structure of the loop nest rather than to its dynamic trip count, allowing our model to gain efficiency in counting cache misses by exploiting repetitive patterns of cache behavior. Validation against cache simulation confirms the exactness of our formulation. Our method can serve as the basis for a static performance predictor to guide program and data transformations to improve performance.