Dynamic parallelization of single-threaded binary programs using speculative slicing

Authors:
Cheng Wang;Youfeng Wu;Edson Borin;Shiliang Hu;Wei Liu;Dave Sager;Tin-fook Ngai;Jesse Fang
Affiliations:
Intel Corporation, Santa Clara, CA, USA;Intel Corporation, Santa Clara, CA, USA;Intel Corporation, Santa Clara, CA, USA;Intel Corporation, Santa Clara, CA, USA;Intel Corporation, Santa Clara, CA, USA;Intel Corporation, Hillsboro, OR, USA;Intel Corporation, Santa Clara, CA, USA;Intel Corporation, Santa Clara, CA, USA
Venue:
Proceedings of the 23rd international conference on Supercomputing
Year:
2009

Citing 28
Cited 7

The program dependence graph and its use in optimization

ACM Transactions on Programming Languages and Systems (TOPLAS)
The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Data and computation transformations for multiprocessors

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Critical path reduction for scalar programs

Proceedings of the 28th annual international symposium on Microarchitecture
New faster Kernighan-Lin-type graph-partitioning algorithms

ICCAD '93 Proceedings of the 1993 IEEE/ACM international conference on Computer-aided design
Hardware and software support for speculative execution of sequential binaries on a chip-multiprocessor

ICS '98 Proceedings of the 12th international conference on Supercomputing
Task selection for a multiscalar processor

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Removing architectural bottlenecks to the scalability of speculative parallelization

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Integrating Program Optimizations and Transformations with the Scheduling of Instruction Level Parallelism

LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
Master/slave speculative parallelization

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
The Potential for Using Thread-Level Data Speculation to Facilitate Automatic Parallelization

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture

Proceedings of the 30th annual international symposium on Computer architecture
Thread-Spawning Schemes for Speculative Multithreading

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Program slices: formal, psychological, and practical investigations of an automatic program abstraction method

Program slices: formal, psychological, and practical investigations of an automatic program abstraction method
Min-cut program decomposition for thread-level speculation

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
A cost-driven compilation framework for speculative parallelization of sequential programs

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Memory Ordering: A Value-Based Approach

Proceedings of the 31st annual international symposium on Computer architecture
Tasking with out-of-order spawn in TLS chip multiprocessors: microarchitecture and compilation

Proceedings of the 19th annual international conference on Supercomputing
Automatic Thread Extraction with Decoupled Software Pipelining

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
MinneSPEC: A New SPEC Benchmark Workload for Simulation-Based Computer Architecture Research

IEEE Computer Architecture Letters
POSH: a TLS compiler that exploits program structure

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Program Demultiplexing: Data-flow based Speculative Parallelization of Methods in Sequential Programs

Proceedings of the 33rd annual international symposium on Computer Architecture
Hardware atomicity for reliable software speculation

Proceedings of the 34th annual international symposium on Computer architecture
Core fusion: accommodating software diversity in chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Speculative Decoupled Software Pipelining

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Exploiting Postdominance for Speculative Parallelization

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Mitosis: A Speculative Multithreaded Processor Based on Precomputation Slices

IEEE Transactions on Parallel and Distributed Systems
A new foundation for control-dependence and slicing for modern program structures

ESOP'05 Proceedings of the 14th European conference on Programming Languages and Systems

ISAMAP: instruction mapping driven by dynamic binary translation

ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
A HW/SW co-designed heterogeneous multi-core virtual machine for energy-efficient general purpose computing

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Limits of region-based dynamic binary parallelization

Proceedings of the 9th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
SMARQ: Software-Managed Alias Register Queue for Dynamic Optimizations

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
HW/SW co-designed acceleration of dynamic languages

Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
Computational caches

Proceedings of the 6th International Systems and Storage Conference
ASC: automatically scalable computation

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The performance of single-threaded programs and legacy binary code is of critical importance in many everyday applications. However, neither can hardware multi-core processors directly speed up single-threaded programs, nor can software automatic parallelizing compilers effectively parallelize legacy binary code and irregular applications. In this paper, we propose a framework and a set of algorithms to dynamically parallelize single-threaded binary programs. Our parallelization is based on program slicing and explores both instruction-level parallelism (ILP) and thread-level parallelism (TLP). To significantly reduce the critical path of the parallel slices, our slicing algorithms exploit speculation to cut rare dependences, and use well-designed program transformations to expose parallelism. Furthermore, because we transparently parallelize binary code at runtime, we perform slicing only on program hot regions. Our experiments demonstrate that the proposed speculative slicing approach extracts more parallelism than any known slicing based parallelization schemes. For the SPEC2000 benchmarks, we can achieve 3x parallelism with infinite number of threads, and 1.8x parallelism with 4 threads.