Analysis of cache performance for operating systems and multiprogramming
Analysis of cache performance for operating systems and multiprogramming
Strategies for cache and local memory management by global program transformation
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Sparse matrix computations: implications for cache designs
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
MOB forms: a class of multilevel block algorithms for dense linear algebra operations
ICS '94 Proceedings of the 8th international conference on Supercomputing
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Expected I-cache miss rates via the gap model
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Compiler optimizations for improving data locality
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
An analytic study of caching in computer systems
Journal of Parallel and Distributed Computing
A quantitative analysis of loop nest locality
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
An Analytical Model for Designing Memory Hierarchies
IEEE Transactions on Computers
Trace-driven memory simulation: a survey
ACM Computing Surveys (CSUR)
ProfileMe: hardware support for instruction-level profiling on out-of-order processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Simplification of array access patterns for compiler optimizations
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Modeling set associative caches behavior for irregular computations
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Analytical Modeling of Set-Associative Cache Behavior
IEEE Transactions on Computers
Cache miss equations: a compiler framework for analyzing and tuning memory behavior
ACM Transactions on Programming Languages and Systems (TOPLAS)
Performance analysis using the MIPS R10000 performance counters
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Exact analysis of the cache behavior of nested loops
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Parallel Programming with Polaris
Computer
Measuring Cache and TLB Performance and Their Effect on Benchmark Runtimes
IEEE Transactions on Computers
On Estimating and Enhancing Cache Effectiveness
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
A Comparison of Compiler Tiling Algorithms
CC '99 Proceedings of the 8th International Conference on Compiler Construction, Held as Part of the European Joint Conferences on the Theory and Practice of Software, ETAPS'99
Automatic Analytical Modeling for the Estimation of Cache Misses
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Let's Study Whole-Program Cache Behaviour Analytically
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Software methods for improvement of cache performance on supercomputer applications
Software methods for improvement of cache performance on supercomputer applications
Compile-time performance prediction of scientific programs
Compile-time performance prediction of scientific programs
Analytical modeling of codes with arbitrary data-dependent conditional structures
Journal of Systems Architecture: the EUROMICRO Journal
Precise automatable analytical modeling of the cache behavior of codes with indirections
ACM Transactions on Architecture and Code Optimization (TACO)
Static analysis of the worst-case memory performance for irregular codes with indirections
ACM Transactions on Architecture and Code Optimization (TACO)
Address independent estimation of the boundaries of cache performance
Microprocessors & Microsystems
Hi-index | 14.98 |
The increasing gap between processor and main memory speeds makes the role of the memory hierarchy behavior in the system performance essential. Both hardware and software techniques to improve this behavior require good analysis tools that help predict and understand such behavior. Analytical modeling arises as a good choice in this field due to its high speed if its traditional limited precision is overcome. We present a modular analytical modeling strategy for arbitrary set-associative caches with LRU replacement policy. The model differs from all the previous related works in its probabilistic approach. Both perfectly and nonperfectly nested loops as well as reuse between different nests are considered by this model, so it makes the analysis of complete programs with regular computations feasible. Moreover, the model achieves good levels of accuracy while being extremely fast and flexible enough to allow its extension. Our approach has been extensively validated using well-known benchmarks. Finally, the model has also proven its ability to drive code optimizations even more successfully than current production compilers.