International Journal of High Speed Computing
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
The Omega test: a fast and practical integer programming algorithm for dependence analysis
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
SUIF: an infrastructure for research on parallelizing and optimizing compilers
ACM SIGPLAN Notices
Detecting coarse-grain parallelism using an interprocedural parallelizing compiler
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
An HPF compiler for the IBM SP2
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Optimizing alpha executables on Windows NT with spike
Digital Technical Journal
Alto: a link-time optimizer for the Compaq alpha
Software—Practice & Experience
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Dependence Analysis for Supercomputing
Dependence Analysis for Supercomputing
Modern Compiler Implementation in C
Modern Compiler Implementation in C
Parallel Programming with Polaris
Computer
The Design of the PROMIS Compiler
CC '99 Proceedings of the 8th International Conference on Compiler Construction, Held as Part of the European Joint Conferences on the Theory and Practice of Software, ETAPS'99
Control and data dependence for program transformations.
Control and data dependence for program transformations.
Speedup of ordinary programs
Optimizing supercompilers for supercomputers
Optimizing supercompilers for supercomputers
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Link-time binary rewriting techniques for program compaction
ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallel Programmer Productivity: A Case Study of Novice Parallel Programmers
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Dynamic parallelization and mapping of binary executables on hierarchical platforms
Proceedings of the 3rd conference on Computing frontiers
Instrumentation and optimization of Win32/intel executables using Etch
NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Spike: an optimizer for alpha/NT executables
NT'97 Proceedings of the USENIX Windows NT Workshop on The USENIX Windows NT Workshop 1997
Polyhedral parallelization of binary code
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
MP-Tomasulo: A Dependency-Aware Automatic Parallel Execution Engine for Sequential Programs
ACM Transactions on Architecture and Code Optimization (TACO)
A compiler-level intermediate representation based binary analysis and rewriting system
Proceedings of the 8th ACM European Conference on Computer Systems
Proceedings of the 6th International Systems and Storage Conference
ASC: automatically scalable computation
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
Today, nearly all general-purpose computers are parallel, but nearly all software running on them is serial. However bridging this disconnect by manually rewriting source code in parallel is prohibitively expensive. Automatic parallelization technology is therefore an attractive alternative. We present a method to perform automatic parallelization in a binary rewriter. The input to the binary rewriter is the serial binary executable program and the output is a parallel binary executable. The advantages of parallelization in a binary rewriter versus a compiler include (i) compatibility with all compilers and languages, (ii) high economic feasibility from avoiding repeated compiler implementation, (iii) applicability to legacy binaries, and (iv) applicability to assembly-language programs. Adapting existing parallelizing compiler methods that work on source code to work on binary programs instead is a significant challenge. This is primarily because symbolic and array index information used in existing compiler parallelizers is not available in a binary. We show how to adapt existing parallelization methods to achieve equivalent parallelization from a binary without such information. Preliminary results using our x86 binary rewriter called Second Write on a suite of dense-matrix regular programs including the externally developed Polybench suite of benchmarks shows an average speedup of 5.1 from binary and 5.7 from source with 8 threads compared to the input serial binary on an x86 Xeon E5530 machine, and 14.7 from binary and 15.4 from source with 32 threads compared to the input serial binary on a SPARC T2. Such regular loops are an important component of scientific and multi-media workloads, and are even present to a limited extent in otherwise non-regular programs.