Limits of region-based dynamic binary parallelization

Authors:
Tobias J.K. Edler von Koch;Björn Franke
Affiliations:
University of Edinburgh, Edinburgh, United Kingdom;University of Edinburgh, Edinburgh, United Kingdom
Venue:
Proceedings of the 9th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Year:
2013

Citing 22
Cited 0

Multiscalar processors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Path-based next trace prediction

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Hardware and software support for speculative execution of sequential binaries on a chip-multiprocessor

ICS '98 Proceedings of the 12th international conference on Supercomputing
Clustered speculative multithreaded processors

ICS '99 Proceedings of the 13th international conference on Supercomputing
Dynamo: a transparent dynamic optimization system

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Guest Editors' Introduction: Welcome to the Opportunities of Binary Translation

Computer
A region-based compilation technique for a Java just-in-time compiler

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
The Jrpm system for dynamically parallelizing Java programs

Proceedings of the 30th annual international symposium on Computer architecture
Thread-Spawning Schemes for Speculative Multithreading

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Chip multi-processor scalability for single-threaded applications

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Constructing Virtual Architectures on a Tiled Processor

Proceedings of the International Symposium on Code Generation and Optimization
Dynamic parallelization and mapping of binary executables on hierarchical platforms

Proceedings of the 3rd conference on Computing frontiers
PIN: a binary instrumentation tool for computer architecture research and education

WCAE '04 Proceedings of the 2004 workshop on Computer architecture education: held in conjunction with the 31st International Symposium on Computer Architecture
Loop recreation for thread-level speculation

ICPADS '07 Proceedings of the 13th International Conference on Parallel and Distributed Systems - Volume 01
Amdahl's Law in the Multicore Era

Computer
Dynamic parallelization of single-threaded binary programs using speculative slicing

Proceedings of the 23rd international conference on Supercomputing
Toward a more accurate understanding of the limits of the TLS execution paradigm

IISWC '10 Proceedings of the IEEE International Symposium on Workload Characterization (IISWC'10)
Runtime parallelization of legacy code on a transactional memory system

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Generalized just-in-time trace compilation using a parallel task farm in a dynamic binary translator

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Polyhedral parallelization of binary code

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Runtime automatic speculative parallelization

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
A trace-based Java JIT compiler retrofitted from a method-based compiler

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

Efficiently executing sequential legacy binaries on chip multi-processors (CMPs) composed of many, small cores is one of today's most pressing problems. Single-threaded execution is a suboptimal option due to CMPs' lower single-core performance, while multi-threaded execution relies on prior parallelization, which is severely hampered by the low-level binary representation of applications compiled and optimized for a single-core target. A recent technology to address this problem is Dynamic Binary Parallelization (DBP), which creates a Virtual Execution Environment (VEE) taking advantage of the underlying multicore host to transparently parallelize the sequential binary executable. While still in its infancy, DBP has received broad interest within the research community. The combined use of DBP and thread-level speculation (TLS) has been proposed as a technique to accelerate legacy uniprocessor code on modern CMPs. In this paper, we investigate the limits of DBP and seek to gain an understanding of the factors contributing to these limits and the costs and overheads of its implementation. We have performed an extensive evaluation using a parameterizable DBP system targeting a CMP with light-weight architectural TLS support. We demonstrate that there is room for a significant reduction of up to 54% in the number of instructions on the critical paths of legacy SPEC CPU2006 benchmarks. However, we show that it is much harder to translate these savings into actual performance improvements, with a realistic hardware-supported implementation achieving a speedup of 1.09 on average.