TAO: two-level atomicity for dynamic binary optimizations

Authors:
Edson Borin;Youfeng Wu;Cheng Wang;Wei Liu;Mauricio Breternitz, Jr.;Shiliang Hu;Esfir Natanzon;Shai Rotem;Roni Rosner
Affiliations:
Intel Corporation, Santa Clara, CA, USA;Intel Corporation, Santa Clara, CA, USA;Intel Corporation, Santa Clara, CA, USA;Intel Corporation, Santa Clara, CA, USA;Intel Corporation, Santa Clara, CA, USA;Intel Corporation, Santa Clara, CA, USA;Intel Corporation, Haifa, Israel;Intel Corporation, Haifa, Israel;Intel Corporation, Haifa, Israel
Venue:
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Year:
2010

Citing 26
Cited 7

Transactional memory: architectural support for lock-free data structures

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Trace cache: a low latency approach to high bandwidth instruction fetching

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Advanced compiler design and implementation

Advanced compiler design and implementation
A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Binary translation and architecture convergence issues for IBM system/390

Proceedings of the 14th international conference on Supercomputing
A hardware mechanism for dynamic extraction and relayout of program hot spots

Proceedings of the 27th annual international symposium on Computer architecture
Increasing the size of atomic instruction blocks using control flow assertions

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Dynamic Binary Translation and Optimization

IEEE Transactions on Computers
rePLay: A Hardware Framework for Dynamic Optimization

IEEE Transactions on Computers
The Transmeta Code Morphing™ Software: using speculation, recovery, and adaptive retranslation to address real-life challenges

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Speculative Versioning Cache

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Dynamic Optimization of Micro-Operations

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
The Jrpm system for dynamically parallelizing Java programs

Proceedings of the 30th annual international symposium on Computer architecture
IA-32 Execution Layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium®-based systems

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Hardware Support for Control Transfers in Code Caches

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Specialized Dynamic Optimizations for High-Performance Energy-Efficient Microarchitecture

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A cost-driven compilation framework for speculative parallelization of sequential programs

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Power Awareness through Selective Dynamically Optimized Traces

Proceedings of the 31st annual international symposium on Computer architecture
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Efficient, transparent, and comprehensive runtime code manipulation

Efficient, transparent, and comprehensive runtime code manipulation
POSH: a TLS compiler that exploits program structure

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Supporting nested transactional memory in logTM

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
HDTrans: an open source, low-level dynamic instrumentation system

Proceedings of the 2nd international conference on Virtual execution environments
Hardware atomicity for reliable software speculation

Proceedings of the 34th annual international symposium on Computer architecture
Dynamic performance tuning for speculative threads

Proceedings of the 36th annual international symposium on Computer architecture
StarDBT: an efficient multi-platform dynamic binary translation system

ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture

FlexBulk: intelligently forming atomic blocks in blocked-execution multiprocessors to minimize squashes

Proceedings of the 38th annual international symposium on Computer architecture
On-the-fly detection of precise loop nests across procedures on a dynamic binary translation system

Proceedings of the 8th ACM International Conference on Computing Frontiers
LAR-CC: Large atomic regions with conditional commits

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
A HW/SW co-designed heterogeneous multi-core virtual machine for energy-efficient general purpose computing

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
BlockChop: dynamic squash elimination for hybrid processor architecture

Proceedings of the 39th Annual International Symposium on Computer Architecture
DeAliaser: alias speculation using atomic region support

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
TSO_ATOMICITY: efficient hardware primitive for TSO-preserving region optimizations

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Dynamic binary translation is a key component of Hardware/Software (HW/SW) co-design, which is an enabling technology for processor microarchitecture innovation. There are two well-known dynamic binary optimization techniques based on atomic execution support. Frame-based optimizations leverage processor pipeline support to enable atomic execution of hot traces. Region level optimizations employ transactional-memory-like atomicity support to aggressively optimize large regions of code. In this paper we propose a two-level atomic optimization scheme which not only overcomes the limitations of the two approaches, but also boosts the benefits of the two approaches effectively. Our experiment shows that the combined approach can achieve a total of 21.5% performance improvement over an aggressive out-of-order baseline machine and improve the performance over the frame-based approach by an additional 5.3%.