Off-loading application controlled data prefetching in numerical codes for multi-core processors

Authors:
J. Weidendorfer;C. Trinitis
Affiliations:
Institut fur Informatik, Technische Universitat Munchen, D-85747 Garching bei Munchen, Germany.;Institut fur Informatik, Technische Universitat Munchen, D-85747 Garching bei Munchen, Germany
Venue:
International Journal of Computational Science and Engineering
Year:
2008

Citing 7
Cited 0

Skewed Associativity Improves Program Performance and Enhances Predictability

IEEE Transactions on Computers
Sequential Hardware Prefetching in Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Performance Optimization of 3D Multigrid on Hierarchical Memory Architectures

PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
Physical Experimentation with Prefetching Helper Threads on Intel's Hyper-Threaded Processors

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Techniques for Efficient Processing in Runahead Execution Engines

Proceedings of the 32nd annual international symposium on Computer Architecture
Establishing Moore's Law

IEEE Annals of the History of Computing
Cache optimizations for iterative numerical codes aware of hardware prefetching

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

An important issue when designing numerical code in HighPerformance Computing is cache optimisation in order to exploit theperformance potential of a given target architecture. This includestechniques to improve memory access locality as well asprefetching. Inherent algorithm constrains often limit the firstapproach, which typically uses a blocking technique. While thereexist automatic prefetching mechanisms in hardware and/orcompilers, they can not complement blocking with additionalprefetching. We provide an infrastructure for off-loadingapplication controlled prefetching on a chip multiprocessor,allowing to further improve numerical code already optimised bystandard cache optimisation. Clear benefits are shown for realworkloads on existing hardware.