Improving the Effectiveness of Software Prefetching with Adaptive Execution

Authors:
Rafael H. Saavedra;Daeyeon Park
Affiliations:
-;-
Venue:
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Year:
1996

Citing 0
Cited 17

Dynamic feedback: an effective technique for adaptive computing

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Eliminating synchronization overhead in automatically parallelized programs using dynamic feedback

ACM Transactions on Computer Systems (TOCS)
A framework for remote dynamic program optimization

DYNAMO '00 Proceedings of the ACM SIGPLAN workshop on Dynamic and adaptive compilation and optimization
High-level adaptive program optimization with ADAPT

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Dynamic hot data stream prefetching for general-purpose programs

PLDI '02 Proceedings of the ACM SIGPLAN 2002 Conference on Programming language design and implementation
A Modal Model of Memory

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Towards a compilation paradigm for computational applications on the information power grid

Computational science, mathematics and software
Adaptive execution techniques for SMT multiprocessor architectures

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
A Self-Repairing Prefetcher in an Event-Driven Dynamic Optimization Framework

Proceedings of the International Symposium on Code Generation and Optimization
Online performance auditing: using hot optimizations without getting burned

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Online Phase-Adaptive Data Layout Selection

ECOOP '08 Proceedings of the 22nd European conference on Object-Oriented Programming
Quick and Practical Run-Time Evaluation of Multiple Program Optimizations

Transactions on High-Performance Embedded Architectures and Compilers I
Adaptive execution techniques of parallel programs for multiprocessors

Journal of Parallel and Distributed Computing
Using runtime activity to dynamically filter out inefficient data prefetches

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
A practical method for quickly evaluating program optimizations

HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
When Prefetching Works, When It Doesn’t, and Why

ACM Transactions on Architecture and Code Optimization (TACO)
Adaptively increasing performance and scalability of automatically parallelized programs

LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The effectiveness of software prefetching for tolerating latency depends mainly on the ability of programmers and/or compilers to: 1) predict in advance the magnitude of the run-time remote memory latency, and 2) insert prefetches at a distance that minimizes stall time without causing cache pollution. Scalable heterogeneous multiprocessors, such as network of computers (NOWs), present special challenges to static software prefetching because on these systems the network topology and node configuration are not completely determined at compile time. Furthermore, dynamic software prefetching cannot do much better because individual nodes on heterogeneous large NOWs would tend to experience different remote memory delays over time. A fixed prefetch distance, even when computed at run-time, cannot perform well for the whole duration of a software pipeline. Here we present an adaptive scheme for software prefetching that makes it possible for nodes to dynamically change, not only the amount of prefetching, but the prefetch distance as well. Doing this makes it possible to tailor the execution of software pipeline to the previaling conditions affecting each node. We show how simple performance data collected by hardware monitors can allow programs to observe, evaluate and change their prefetching policies. Our results show that on the benchmarks we simulated adaptive prefetching was capable of improving performance over static and dynamic prefetching by 10% to 60%. More important, future increases in the heterogeneity and size of NOWs will increase the advantages of adaptive prefetching over static and dynamic schemes.