A scalable approach to thread-level speculation
Proceedings of the 27th annual international symposium on Computer architecture
Compiler optimization of scalar value communication between speculative threads
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The STAMPede approach to thread-level speculation
ACM Transactions on Computer Systems (TOCS)
Compiler optimizations for parallelizing general-purpose applications under thread-level speculation
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Compiler and hardware support for reducing the synchronization of speculative threads
ACM Transactions on Architecture and Code Optimization (TACO)
Dynamic performance tuning for speculative threads
Proceedings of the 36th annual international symposium on Computer architecture
Exploiting speculative thread-level parallelism in data compression applications
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Supporting speculative parallelization in the presence of dynamic data structures
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Enhanced speculative parallelization via incremental recovery
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
Chip-multiprocessor (CMP) architectures are a promising design alternative to exploit the ever-increasing number of transistors that can be put on a die. To deliver high performance on applications that cannot be easily parallelized, CMPs can use additional support for speculatively executing the possibly data-dependent threads of an application.While some of the cross-thread dependences in applications must be handled dynamically, others can be fully determined by the compiler. For the latter dependences, the threads can be made to synchronize and communicate either at the register level or at the memory level. In the past, it has been unclear whether the higher hardware cost of register-level communication is cost-effective.In this paper, we show that the wide-issue dynamic processors that will soon populate CMPs, make fast communication a requirement for high performance. Consequently, we propose an effective hardware mechanism to support communication and synchronization of registers between on-chip processors. Our scheme adds enough support to enable register-level communication without specializing the architecture so much toward speculation that it leads to much unutilized hardware under workloads that do not need speculative parallelization. Finally, the scheme allows the system to achieve near ideal performance.