Accelerating and Adapting Precomputation Threads for Effcient Prefetching

Authors:
Weifeng Zhang;Dean M. Tullsen;Brad Calder
Affiliations:
Department of Computer Science and Engineering, University of California, San Diego;Department of Computer Science and Engineering, University of California, San Diego;Department of Computer Science and Engineering, University of California, San Diego
Venue:
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Year:
2007

Citing 0
Cited 9

Just-In-Time Locality and Percolation for Optimizing Irregular Applications on a Manycore Architecture

Languages and Compilers for Parallel Computing
Spatio-temporal memory streaming

Proceedings of the 36th annual international symposium on Computer architecture
Software data spreading: leveraging distributed caches to improve single thread performance

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Inter-core prefetching for multicore processors using migrating helper threads

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Analysis and performance results of computing betweenness centrality on IBM Cyclops64

The Journal of Supercomputing
Sprint: speculative prefetching of remote data

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Implications of electronics technology trends to algorithm design

VoCS'08 Proceedings of the 2008 international conference on Visions of Computer Science: BCS International Academic Conference
Coalition threading: combining traditional andnon-traditional parallelism to maximize scalability

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Fix the code. Don't tweak the hardware: A new compiler approach to Voltage-Frequency scaling

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

Speculative precomputation enables effective cache prefetching for even irregular memory access behavior, by using an alternate thread on a multithreaded or multi-core architecture. This paper describes a system that constructs and runs precomputation based prefetching threads via event-driven dynamic optimization. Precomputation threads are dynamically constructed by a runtime compiler from the program's frequently executed hot traces, and are adapted to the memory behavior automatically. Both construction and execution of the prefetching threads happen in another thread, imposing little overhead on the main thread. This paper also presents several techniques to accelerate the precomputation threads, including colocation of p-threads with hot traces, dynamic stride prediction, and automatic adptation of runahead and jumpstart distance. The adaptive prefetching achieves 42% speedup, a 17% improvement over existing p-thread prefetching schemes.