Hardware atomicity for reliable software speculation

  • Authors:
  • Naveen Neelakantam;Ravi Rajwar;Suresh Srinivas;Uma Srinivasan;Craig Zilles

  • Affiliations:
  • University of Illinois at Urbana-Champaign, Urbana, IL;Intel Corporation, Hillsboro, OR;Intel Corporation, Hillsboro, OR;Intel Corporation, Hillsboro, OR;University of Illinois at Urbana-Champaign, Urbana, IL

  • Venue:
  • Proceedings of the 34th annual international symposium on Computer architecture
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Speculative compiler optimizations are effective in improving both single-thread performance and reducing power consumption, but their implementation introduces significant complexity, which can limit their adoption, limit their optimization scope, and negatively impact the reliability of the compilers that implement them. To eliminate much of this complexity, as well as increase the effectiveness of these optimizations, we propose that microprocessors provide architecturally-visible hardware primitives for atomic execution. These primitives provide to the compiler the ability to optimize the program's hot path in isolation, allowing the use of non-speculative formulations of optimization passes to perform speculative optimizations. Atomic execution guarantees that if a speculation invariant does not hold, the speculative updates are discarded, the register state is restored, and control is transferred to a non-speculative version of the code, thereby relieving the compiler from the responsibility of generating compensation code. We demonstrate the benefit of hardware atomicity in the context of a Java virtual machine. We find incorporating the notion of atomic regions into an existing compiler intermediate representation to be natural, requiring roughly 3,000 lines of code (~3% of a JVM's optimizing compiler), most of which were for region formation. Its incorporation creates new opportunities for existing optimization passes, as well as greatly simplifying the implementation of additional optimizations (e.g., partial inlining, partial loop unrolling, and speculative lock elision). These optimizations reduce dynamic instruction count by 11% on average and result in a 10-15% average speedup, relative to a baseline compiler with a similar degree of inlining.