HiRe: using hint & release to improve synchronization of speculative threads

Authors:
Liang Han;Xiaowei Jiang;Wei Liu;Youfeng Wu;James Tuck
Affiliations:
Qualcomm, San Diego, CA, USA;Intel, Hillsboro, OR, USA;Intel, Santa Clara, CA, USA;Intel, Santa Clara, CA, USA;North Carolina State University, Raleigh, NC, USA
Venue:
Proceedings of the 26th ACM international conference on Supercomputing
Year:
2012

Citing 18
Cited 0

The program dependence graph and its use in optimization

ACM Transactions on Programming Languages and Systems (TOPLAS)
Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Dynamic speculation and synchronization of data dependences

Proceedings of the 24th annual international symposium on Computer architecture
Data speculation support for a chip multiprocessor

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
A Chip-Multiprocessor Architecture with Speculative Multithreading

IEEE Transactions on Computers
The Superthreaded Processor Architecture

IEEE Transactions on Computers
Value prediction for speculative multithreaded architectures

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
A scalable approach to thread-level speculation

Proceedings of the 27th annual international symposium on Computer architecture
On the value locality of store instructions

Proceedings of the 27th annual international symposium on Computer architecture
Temporally silent stores

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
In Search of Speculative Thread-Level Parallelism

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Eliminating Squashes Through Learning Cross-Thread Violations in Speculative Parallelization for Multiprocessors

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Compiler Optimization of Memory-Resident Value Communication Between Speculative Threads

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Mitosis compiler: an infrastructure for speculative threading based on pre-computation slices

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
The STAMPede approach to thread-level speculation

ACM Transactions on Computer Systems (TOCS)
Tasking with out-of-order spawn in TLS chip multiprocessors: microarchitecture and compilation

Proceedings of the 19th annual international conference on Supercomputing
POSH: a TLS compiler that exploits program structure

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Compiler optimizations for parallelizing general-purpose applications under thread-level speculation

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Thread-Level Speculation (TLS) is a promising technique for improving performance of serial codes on multi-cores by automatically extracting threads and running them in parallel. However, the speculation efficiency as well as the performance gain of TLS systems are reduced by cross-thread data dependence violations. Reducing the cost and frequency of violations are key to improving the efficiency of TLS. One method to keep a dependence from violating is to predict it and communicate the value via synchronization. However, prior work in this field still cannot handle enough violating dependences, especially hard-to-predict ones and those in non-loop TLS tasks. Also, they suffer from over-synchronization and/or introduce complicated hardware. The major reason is that these techniques are highly sensitive to the accuracy of the dependence prediction, which is hard to improve in the face of irregular dependence and task patterns. In this paper, we propose a novel synchronization technique that avoids over synchronization and works for irregularly occurring dependences. We use a profiler to find and mark store-load pairs that generate data dependences. Then, the compiler schedules a hint instruction in advance of the store to inform successor threads of a possible pending write to a specific address; in this way, later loads only wait for a store if the loading location has been hinted. The compiler also schedules a release instruction that notifies the load when it should proceed. It places the release both after the store and on every path leading away from the hint that does not pass through the store. By placing it on all such paths, we limit the cost due to over synchronization. Together, the hint and release form our proposal, called HiRe. We implemented the HiRe scheme on a well-tuned TLS system and evaluated it on a set of SPEC CPU 2000 applications; we find that HiRe suffers only 22% of the violations that occur in our base TLS system, and it cuts the instruction waste rate of TLS in half. Furthermore, it outperforms prior approaches we studied by 3%.