Efficient emulation of hardware prefetchers via event-driven helper threading
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Future execution: A prefetching mechanism that uses multiple cores to speed up single threads
ACM Transactions on Architecture and Code Optimization (TACO)
Accurate branch prediction for short threads
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
On the potential of latency tolerant execution in speculative multithreading
IFMT '08 Proceedings of the 1st international forum on Next-generation multicore/manycore technologies
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Disjoint out-of-order execution processor
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Speculative Multi-Threading (SpMT) can improve single-threaded application performance using the multiple thread contexts available in current processors. We propose a minimal SpMT model that uses only two thread contexts. The model achieves significant speedups for single-threaded applications using a low-overhead scheme for detecting and selectively recovering from data dependence violations, and a novel Wrong Path Predictor to reduce the number of speculative threads executing along the wrong path. We also study the interactions between three previously proposed SpMT thread spawning policies that can be implemented dynamically in hardware - Forkon Call, Loop Continuation and Run Ahead policies - and show it is beneficial to implement all three policies together in a processor. While the individual thread spawning policies show performance benefits of 14%, 5% and 4% respectively on our SpMT model over a base processor that does not exploit SpMT, combining all three policies shows an average performance gain of 20%. Finally, we identify the sources of SpMT benefits - on average, 58% of the performance benefits due to SpMT comes from cache prefetching, 33% from instruction reuse, and 9% from branch precomputation and show all three sources of SpMT benefits must be utilized to realize the full potential of SpMT.