Polyhedral parallelization of binary code

Authors:
Benoit Pradelle;Alain Ketterlin;Philippe Clauss
Affiliations:
INRIA Nancy Grand Est and LSIIT, Université de Strasbourg, France;INRIA Nancy Grand Est and LSIIT, Université de Strasbourg, France;INRIA Nancy Grand Est and LSIIT, Université de Strasbourg, France
Venue:
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Year:
2012

Citing 17
Cited 2

Efficiently computing static single assignment form and the control dependence graph

ACM Transactions on Programming Languages and Systems (TOPLAS)
Some efficient solutions to the affine scheduling problem: I. One-dimensional time

International Journal of Parallel Programming
Advanced compiler design and implementation

Advanced compiler design and implementation
Parametric Analysis of Polyhedral Iteration Spaces

Journal of VLSI Signal Processing Systems - Special issue on application specific systems, architectures and processors
Preliminary Evaluation of a Binary Translation System for Multithreaded Processors

IWIA '02 Proceedings of the International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'02)
Modern Compiler Implementation in C

Modern Compiler Implementation in C
Code Generation in the Polyhedral Model Is Easier Than You Think

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
POSH: a TLS compiler that exploits program structure

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Dynamic parallelization and mapping of binary executables on hierarchical platforms

Proceedings of the 3rd conference on Computing frontiers
Counting Integer Points in Parametric Polytopes Using Barvinok's Rational Functions

Algorithmica
A practical automatic polyhedral parallelizer and locality optimizer

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
Symbolic polynomial maximization over convex sets and its application to memory requirement estimation

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
isl: an integer set library for the polyhedral model

ICMS'10 Proceedings of the Third international congress conference on Mathematical software
Automatic Parallelization in a Binary Rewriter

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Runtime parallelization of legacy code on a transactional memory system

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Runtime automatic speculative parallelization

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization

Limits of region-based dynamic binary parallelization

Proceedings of the 9th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Recovering memory access patterns of executable programs

Science of Computer Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many automatic software parallelization systems have been proposed in the past decades, but most of them are dedicated to source-to-source transformations. This paper shows that parallelizing executable programs is feasible, even if they require complex transformations, and in effect decouples parallelization from compilation, for example, for closed-source or legacy software, where binary code is the only available representation. We propose an automatic parallelizer, which is able to perform advanced parallelization on binary code. It first parses the binary code and extracts high-level information. From this information, a C program is generated. This program captures only a subset of the program semantics, namely, loops and memory accesses. This C program is then parallelized using existing, state-of-the-art parallelizers, including advanced polyhedral parallelizers. The original program semantics is then re-injected, and the transformed parallel loop nests are recompiled by a standard C compiler. We show on the PolyBench benchmark suite that our system successfully detects and parallelizes almost all the loop nests from the binary code, using a recent polyhedral loop parallelizer as a backend. The paper ends by elaborating a strategy to parallelize more complex programs, such as those containing non-linear accesses to memory, and provides a few example case-studies.