The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
MIPS RISC architectures
Alpha architecture reference manual
Alpha architecture reference manual
Automatic and interactive parallelization
Automatic and interactive parallelization
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
On Estimating and Enhancing Cache Effectiveness
Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Software methods for improvement of cache performance on supercomputer applications
Software methods for improvement of cache performance on supercomputer applications
Nonlinear array layouts for hierarchical memory systems
ICS '99 Proceedings of the 13th international conference on Supercomputing
A tile selection algorithm for data locality and cache interference
ICS '99 Proceedings of the 13th international conference on Supercomputing
Cache performance analysis of traversals and random accesses
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Analytical Modeling of Set-Associative Cache Behavior
IEEE Transactions on Computers
Towards a theory of cache-efficient algorithms
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Exact analysis of the cache behavior of nested loops
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Towards a theory of cache-efficient algorithms
Journal of the ACM (JACM)
A Blocked All-Pairs Shortest-Path Algorithm
SWAT '00 Proceedings of the 7th Scandinavian Workshop on Algorithm Theory
Compiler-Controlled Caching in Superword Register Files for Multimedia Extension Architectures
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
A blocked all-pairs shortest-paths algorithm
Journal of Experimental Algorithmics (JEA)
Hi-index | 0.00 |
State-of-the art data locality optimizing algorithms are targeted for local memories rather than for cache memories. Recent work on cache interferences seems to indicate that these phenomena can severely affect blocked algorithms cache performance. Because of cache conflicts, it is not possible to know the precise gain brought by blocking. It is even difficult to determine for which problem sizes blocking is useful. Computing the actual optimal block size is difficult because cache conflicts are highly irregular. In this article, we illustrate the issue of precisely evaluating cross-interferences in blocked loops with blocked matrix-vector multiply. Most significant interference phenomena are captured because unusual parameters such as array base addresses are being considered. The techniques used allow us to compute the precise improvement due to blocking and the threshold value of problem parameters for which the blocked loop should be preferred. It is also possible to derive an expression of the optimal block size as a function of problem parameters. Finally, it is shown that a precise rather than an approximate evaluation of cache conflicts is sometimes necessary to obtain near-optimal performance.