Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Transactional memory: architectural support for lock-free data structures
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
rePLay: A Hardware Framework for Dynamic Optimization
IEEE Transactions on Computers
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Dynamic Optimization of Micro-Operations
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
The Jrpm system for dynamically parallelizing Java programs
Proceedings of the 30th annual international symposium on Computer architecture
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Hardware Support for Control Transfers in Code Caches
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Specialized Dynamic Optimizations for High-Performance Energy-Efficient Microarchitecture
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A cost-driven compilation framework for speculative parallelization of sequential programs
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Power Awareness through Selective Dynamically Optimized Traces
Proceedings of the 31st annual international symposium on Computer architecture
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Efficient, transparent, and comprehensive runtime code manipulation
Efficient, transparent, and comprehensive runtime code manipulation
POSH: a TLS compiler that exploits program structure
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
HDTrans: an open source, low-level dynamic instrumentation system
Proceedings of the 2nd international conference on Virtual execution environments
Hardware atomicity for reliable software speculation
Proceedings of the 34th annual international symposium on Computer architecture
Dynamic performance tuning for speculative threads
Proceedings of the 36th annual international symposium on Computer architecture
A real system evaluation of hardware atomicity for software speculation
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
TAO: two-level atomicity for dynamic binary optimizations
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
StarDBT: an efficient multi-platform dynamic binary translation system
ACSAC'07 Proceedings of the 12th Asia-Pacific conference on Advances in Computer Systems Architecture
Proceedings of the 38th annual international symposium on Computer architecture
BlockChop: dynamic squash elimination for hybrid processor architecture
Proceedings of the 39th Annual International Symposium on Computer Architecture
DeAliaser: alias speculation using atomic region support
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
TSO_ATOMICITY: efficient hardware primitive for TSO-preserving region optimizations
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
BulkCommit: scalable and fast commit of atomic blocks in a lazy multiprocessor environment
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Just-In-Time Software Pipelining
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Hi-index | 0.00 |
HW/SW Co-designed systems rely on dynamic binary translation and optimizations for efficient execution of binary code. Due to memory ordering properties and other architectural constraints, most binary optimizations are applied to regions of code that are atomically executed. To ensure that the underlying hardware has enough speculative resources to execute the whole atomic region, these systems typically form short atomic regions, with only 20 to 30 instructions. However, the shorter is the atomic region the smaller is the scope for optimizations. We present LAR-CC, a novel technique that enables HW/SW co-designed systems to optimize large atomic regions and dynamically fit them into the available speculative hardware resources by means of conditional commits. The LAR-CC technique consists of two major components: 1) conditional branch instructions to conditionally skip commit operations; 2) code transformations that replace commit operations by conditional commits and enable optimizations to be applied on the large atomic regions. Our experiments show that LAR-CC can effectively achieve dynamic atomic region sizes larger than 1000 instructions, providing sufficiently large scope to apply many advanced optimizations on HW/SW co-designed systems.