The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Fortran at ten gigaflops: the connection machine convolution compiler
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
RISC microprocessors and scientific computing
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
ICS '97 Proceedings of the 11th international conference on Supercomputing
Statistical Models for Empirical Search-Based Performance Tuning
International Journal of High Performance Computing Applications
Optimizing matrix transposes using a POWER7 cache model and explicit prefetching
ACM SIGMETRICS Performance Evaluation Review
Hi-index | 0.00 |
Computationally intensive algorithms must usually be restructured to make the best use of cache memory in current high-performance, hierarchical memory computers. Unfortunately, cache conscious algorithms are sensitive to object sizes and addresses as well as the details of the cache and translation lookaside buffer geometries, and this sensitivity makes both automatic restructuring and hand-turning difficult tasks. An optimization approach is presented in this paper that automatically generates and executes a benchmark program from a concise specification of the algorithm's structure. This technique provides the performance data needed for verification of code generation heuristics or search among the various restructuring options. Matrix transpose and matrix multiplication are examined using this approach for several workstations with restructuring options of loop order, tiling (blocking), and unrolling.