Skewed Associativity Improves Program Performance and Enhances Predictability
IEEE Transactions on Computers
Sequential Hardware Prefetching in Shared-Memory Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Performance Optimization of 3D Multigrid on Hierarchical Memory Architectures
PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Techniques for Efficient Processing in Runahead Execution Engines
Proceedings of the 32nd annual international symposium on Computer Architecture
IEEE Annals of the History of Computing
Cache optimizations for iterative numerical codes aware of hardware prefetching
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Hi-index | 0.00 |
An important issue when designing numerical code in HighPerformance Computing is cache optimisation in order to exploit theperformance potential of a given target architecture. This includestechniques to improve memory access locality as well asprefetching. Inherent algorithm constrains often limit the firstapproach, which typically uses a blocking technique. While thereexist automatic prefetching mechanisms in hardware and/orcompilers, they can not complement blocking with additionalprefetching. We provide an infrastructure for off-loadingapplication controlled prefetching on a chip multiprocessor,allowing to further improve numerical code already optimised bystandard cache optimisation. Clear benefits are shown for realworkloads on existing hardware.