A real system evaluation of hardware atomicity for software speculation

Authors:
Naveen Neelakantam;David R. Ditzel;Craig Zilles
Affiliations:
University of Illinois, Urbana, USA;Intel, Santa Clara, USA;University of Illinois, Urbana, USA
Venue:
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Year:
2010

Citing 15
Cited 9

Efficient superscalar performance through boosting

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Sentinel scheduling: a model for compiler-controlled speculative execution

ACM Transactions on Computer Systems (TOCS)
The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Shade: a fast instruction-set simulator for execution profiling

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Speculative execution exception recovery using write-back suppression

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Dynamo: a transparent dynamic optimization system

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
rePLay: A Hardware Framework for Dynamic Optimization

IEEE Transactions on Computers
Speculative lock elision: enabling highly concurrent multithreaded execution

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
The Transmeta Code Morphing™ Software: using speculation, recovery, and adaptive retranslation to address real-life challenges

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Speculative Versioning Cache

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Reactive Techniques for Controlling Software Speculation

Proceedings of the international symposium on Code generation and optimization
Hardware atomicity for reliable software speculation

Proceedings of the 34th annual international symposium on Computer architecture
Trace Scheduling: A Technique for Global Microcode Compaction

IEEE Transactions on Computers
InvisiFence: performance-transparent memory ordering in conventional multiprocessors

Proceedings of the 36th annual international symposium on Computer architecture

Do you know where your data are?: secure data capsules for deployable data protection

HotOS'13 Proceedings of the 13th USENIX conference on Hot topics in operating systems
LAR-CC: Large atomic regions with conditional commits

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
A HW/SW co-designed heterogeneous multi-core virtual machine for energy-efficient general purpose computing

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
BlockChop: dynamic squash elimination for hybrid processor architecture

Proceedings of the 39th Annual International Symposium on Computer Architecture
DeAliaser: alias speculation using atomic region support

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Allocating rotating registers by scheduling

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Speculative hardware/software co-designed floating-point multiply-add fusion

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Warm-Up Simulation Methodology for HW/SW Co-Designed Processors

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Just-In-Time Software Pipelining

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we evaluate the atomic region compiler abstraction by incorporating it into a commercial system. We find that atomic regions are simple and intuitive to integrate into an x86 binary-translation system. Furthermore, doing so trivially enables additional optimization opportunities beyond that achievable by a high-performance dynamic optimizer, which already implements superblocks. We show that atomic regions can suffer from severe performance penalties if misspeculations are left uncontrolled, but that a simple software control mechanism is sufficient to reign in all detrimental side-effects. We evaluate using full reference runs of the SPEC CPU2000 integer benchmarks and find that atomic regions enable up to a 9% (3% on average) improvement beyond the performance of a tuned product. These performance improvements are achieved without any negative side effects. Performance side effects such as code bloat are absent with atomic regions; in fact, static code size is reduced. The hardware necessary is synergistic with other needs and was already available on the commercial product used in our evaluation. Finally, the software complexity is minimal as a single developer was able to incorporate atomic regions into a sophisticated 300,000 line code base in three months, despite never having seen the translator source code beforehand.