Automatic Parallelization in a Binary Rewriter

Authors:
Aparna Kotha;Kapil Anand;Matthew Smithson;Greeshma Yellareddy;Rajeev Barua
Affiliations:
-;-;-;-;-
Venue:
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Year:
2010

Citing 24
Cited 5

Parafrase-2: an environment for parallelizing, partitioning, synchronizing, and scheduling programs on multiprocessors

International Journal of High Speed Computing
Practical dependence testing

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
The Omega test: a fast and practical integer programming algorithm for dependence analysis

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
SUIF: an infrastructure for research on parallelizing and optimizing compilers

ACM SIGPLAN Notices
Detecting coarse-grain parallelism using an interprocedural parallelizing compiler

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
An HPF compiler for the IBM SP2

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Optimizing alpha executables on Windows NT with spike

Digital Technical Journal
Alto: a link-time optimizer for the Compaq alpha

Software—Practice & Experience
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Dependence Analysis for Supercomputing

Dependence Analysis for Supercomputing
Modern Compiler Implementation in C

Modern Compiler Implementation in C
Parallel Programming with Polaris

Computer
Maximizing Multiprocessor Performance with the SUIF Compiler

Computer
The Design of the PROMIS Compiler

CC '99 Proceedings of the 8th International Conference on Compiler Construction, Held as Part of the European Joint Conferences on the Theory and Practice of Software, ETAPS'99
Control and data dependence for program transformations.

Control and data dependence for program transformations.
Speedup of ordinary programs

Speedup of ordinary programs
Optimizing supercompilers for supercomputers

Optimizing supercompilers for supercomputers
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Link-time binary rewriting techniques for program compaction

ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallel Programmer Productivity: A Case Study of Novice Parallel Programmers

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Dynamic parallelization and mapping of binary executables on hierarchical platforms

Proceedings of the 3rd conference on Computing frontiers
Instrumentation and optimization of Win32/intel executables using Etch

NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Spike: an optimizer for alpha/NT executables

NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Revisiting the Sequential Programming Model for the Multicore Era

IEEE Micro

Polyhedral parallelization of binary code

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
MP-Tomasulo: A Dependency-Aware Automatic Parallel Execution Engine for Sequential Programs

ACM Transactions on Architecture and Code Optimization (TACO)
A compiler-level intermediate representation based binary analysis and rewriting system

Proceedings of the 8th ACM European Conference on Computer Systems
Computational caches

Proceedings of the 6th International Systems and Storage Conference
ASC: automatically scalable computation

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Today, nearly all general-purpose computers are parallel, but nearly all software running on them is serial. However bridging this disconnect by manually rewriting source code in parallel is prohibitively expensive. Automatic parallelization technology is therefore an attractive alternative. We present a method to perform automatic parallelization in a binary rewriter. The input to the binary rewriter is the serial binary executable program and the output is a parallel binary executable. The advantages of parallelization in a binary rewriter versus a compiler include (i) compatibility with all compilers and languages, (ii) high economic feasibility from avoiding repeated compiler implementation, (iii) applicability to legacy binaries, and (iv) applicability to assembly-language programs. Adapting existing parallelizing compiler methods that work on source code to work on binary programs instead is a significant challenge. This is primarily because symbolic and array index information used in existing compiler parallelizers is not available in a binary. We show how to adapt existing parallelization methods to achieve equivalent parallelization from a binary without such information. Preliminary results using our x86 binary rewriter called Second Write on a suite of dense-matrix regular programs including the externally developed Polybench suite of benchmarks shows an average speedup of 5.1 from binary and 5.7 from source with 8 threads compared to the input serial binary on an x86 Xeon E5530 machine, and 14.7 from binary and 15.4 from source with 32 threads compared to the input serial binary on a SPARC T2. Such regular loops are an important component of scientific and multi-media workloads, and are even present to a limited extent in otherwise non-regular programs.