ACM Transactions on Computer Systems (TOCS)
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Influence of cross-interferences on blocked loops: a case study with matrix-vector multiply
ACM Transactions on Programming Languages and Systems (TOPLAS)
A quantitative analysis of loop nest locality
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Trace-driven memory simulation: a survey
ACM Computing Surveys (CSUR)
Cache miss equations: an analytical representation of cache misses
ICS '97 Proceedings of the 11th international conference on Supercomputing
Skewed Associativity Improves Program Performance and Enhances Predictability
IEEE Transactions on Computers
Efficient Analytical Modelling of Multi-Level Set-Associative Caches
HPCN Europe '99 Proceedings of the 7th International Conference on High-Performance Computing and Networking
Predicting the Cache Miss Ratio of Loop-Nested Array References
Predicting the Cache Miss Ratio of Loop-Nested Array References
Aspects of cache memory and instruction buffer performance
Aspects of cache memory and instruction buffer performance
Pace--A Toolset for the Performance Prediction of Parallel and Distributed Systems
International Journal of High Performance Computing Applications
Design of a performance technology infrastructure to support the construction of responsive software
Proceedings of the 2nd international workshop on Software and performance
Exact analysis of the cache behavior of nested loops
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Probabilistic Miss Equations: Evaluating Memory Hierarchy Performance
IEEE Transactions on Computers
Run-Time Optimization Using Dynamic Performance Prediction
HPCN Europe 2000 Proceedings of the 8th International Conference on High-Performance Computing and Networking
Dynamic Instrumentation and Performance Prediction of Application Execution
HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
Use of Performance Technology for the Management of Distributed Systems
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Highly accurate and efficient evaluation of randomising set index functions
Journal of Systems Architecture: the EUROMICRO Journal
A Quantitative Analysis of Tile Size Selection Algorithms
The Journal of Supercomputing
Efficient and Accurate Analytical Modeling of Whole-Program Data Cache Behavior
IEEE Transactions on Computers
A compiler tool to predict memory hierarchy performance of scientific codes
Parallel Computing
Cache optimization for embedded processor cores: An analytical approach
ACM Transactions on Design Automation of Electronic Systems (TODAES)
A Geometric Programming Framework for Optimal Multi-Level Tiling
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Pace--A Toolset for the Performance Prediction of Parallel and Distributed Systems
International Journal of High Performance Computing Applications
Analytical modeling of codes with arbitrary data-dependent conditional structures
Journal of Systems Architecture: the EUROMICRO Journal
Cache-aware iteration space partitioning
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Cache-aware partitioning of multi-dimensional iteration spaces
SYSTOR '09 Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference
International Journal of Modelling and Simulation
Algorithms for memory hierarchies: advanced lectures
Algorithms for memory hierarchies: advanced lectures
Tuning blocked array layouts to exploit memory hierarchy in SMT architectures
PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics
A survey on cache tuning from a power/energy perspective
ACM Computing Surveys (CSUR)
On modeling contention for shared caches in multi-core processors with techniques from ecology
Natural Computing: an international journal
Hi-index | 14.99 |
Cache behavior is complex and inherently unstable, yet it is a critical factor affecting program performance. A method of evaluating cache performance is required, both to give quantitative predictions of miss-ratio and information to guide optimization of cache use. Traditional cache simulation gives accurate predictions of miss-ratio, but little to direct optimization. Also, the simulation time is usually far greater than the program execution time. Several analytical models have been developed, but concentrate mainly on direct-mapped caches, often for specific types of algorithm, or to give qualitative predictions. In this work, novel analytical models of cache phenomena are presented, applicable to numerical codes consisting mostly of array operations in looping constructs. Set-associative caches are considered, through an extensive hierarchy of cache reuse and interference effects, including numerous forms of temporal and spatial locality. Models of each effect are given which, when combined, predict the overall miss-ratio. An advantage is that the models also indicate sources of cache interference. The accuracy of the models is validated through example program fragments. The predicted miss-ratios are compared with simulations and shown typically to be within 15 percent. The evaluation time of the models is shown to be independent of the problem size, generally several orders of magnitude faster than simulation.