Languages and Compilers for Parallel Computing
Spatio-temporal memory streaming
Proceedings of the 36th annual international symposium on Computer architecture
Software data spreading: leveraging distributed caches to improve single thread performance
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Inter-core prefetching for multicore processors using migrating helper threads
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Analysis and performance results of computing betweenness centrality on IBM Cyclops64
The Journal of Supercomputing
Sprint: speculative prefetching of remote data
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Implications of electronics technology trends to algorithm design
VoCS'08 Proceedings of the 2008 international conference on Visions of Computer Science: BCS International Academic Conference
Coalition threading: combining traditional andnon-traditional parallelism to maximize scalability
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Fix the code. Don't tweak the hardware: A new compiler approach to Voltage-Frequency scaling
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Hi-index | 0.00 |
Speculative precomputation enables effective cache prefetching for even irregular memory access behavior, by using an alternate thread on a multithreaded or multi-core architecture. This paper describes a system that constructs and runs precomputation based prefetching threads via event-driven dynamic optimization. Precomputation threads are dynamically constructed by a runtime compiler from the program's frequently executed hot traces, and are adapted to the memory behavior automatically. Both construction and execution of the prefetching threads happen in another thread, imposing little overhead on the main thread. This paper also presents several techniques to accelerate the precomputation threads, including colocation of p-threads with hot traces, dynamic stride prediction, and automatic adptation of runahead and jumpstart distance. The adaptive prefetching achieves 42% speedup, a 17% improvement over existing p-thread prefetching schemes.