HiRe: using hint & release to improve synchronization of speculative threads

  • Authors:
  • Liang Han;Xiaowei Jiang;Wei Liu;Youfeng Wu;James Tuck

  • Affiliations:
  • Qualcomm, San Diego, CA, USA;Intel, Hillsboro, OR, USA;Intel, Santa Clara, CA, USA;Intel, Santa Clara, CA, USA;North Carolina State University, Raleigh, NC, USA

  • Venue:
  • Proceedings of the 26th ACM international conference on Supercomputing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Thread-Level Speculation (TLS) is a promising technique for improving performance of serial codes on multi-cores by automatically extracting threads and running them in parallel. However, the speculation efficiency as well as the performance gain of TLS systems are reduced by cross-thread data dependence violations. Reducing the cost and frequency of violations are key to improving the efficiency of TLS. One method to keep a dependence from violating is to predict it and communicate the value via synchronization. However, prior work in this field still cannot handle enough violating dependences, especially hard-to-predict ones and those in non-loop TLS tasks. Also, they suffer from over-synchronization and/or introduce complicated hardware. The major reason is that these techniques are highly sensitive to the accuracy of the dependence prediction, which is hard to improve in the face of irregular dependence and task patterns. In this paper, we propose a novel synchronization technique that avoids over synchronization and works for irregularly occurring dependences. We use a profiler to find and mark store-load pairs that generate data dependences. Then, the compiler schedules a hint instruction in advance of the store to inform successor threads of a possible pending write to a specific address; in this way, later loads only wait for a store if the loading location has been hinted. The compiler also schedules a release instruction that notifies the load when it should proceed. It places the release both after the store and on every path leading away from the hint that does not pass through the store. By placing it on all such paths, we limit the cost due to over synchronization. Together, the hint and release form our proposal, called HiRe. We implemented the HiRe scheme on a well-tuned TLS system and evaluated it on a set of SPEC CPU 2000 applications; we find that HiRe suffers only 22% of the violations that occur in our base TLS system, and it cuts the instruction waste rate of TLS in half. Furthermore, it outperforms prior approaches we studied by 3%.