Efficiently computing static single assignment form and the control dependence graph
ACM Transactions on Programming Languages and Systems (TOPLAS)
Some efficient solutions to the affine scheduling problem: I. One-dimensional time
International Journal of Parallel Programming
Advanced compiler design and implementation
Advanced compiler design and implementation
Parametric Analysis of Polyhedral Iteration Spaces
Journal of VLSI Signal Processing Systems - Special issue on application specific systems, architectures and processors
Preliminary Evaluation of a Binary Translation System for Multithreaded Processors
IWIA '02 Proceedings of the International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'02)
Modern Compiler Implementation in C
Modern Compiler Implementation in C
Code Generation in the Polyhedral Model Is Easier Than You Think
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
POSH: a TLS compiler that exploits program structure
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Dynamic parallelization and mapping of binary executables on hierarchical platforms
Proceedings of the 3rd conference on Computing frontiers
A practical automatic polyhedral parallelizer and locality optimizer
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
isl: an integer set library for the polyhedral model
ICMS'10 Proceedings of the Third international congress conference on Mathematical software
Automatic Parallelization in a Binary Rewriter
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Runtime parallelization of legacy code on a transactional memory system
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Runtime automatic speculative parallelization
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Limits of region-based dynamic binary parallelization
Proceedings of the 9th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Recovering memory access patterns of executable programs
Science of Computer Programming
Hi-index | 0.00 |
Many automatic software parallelization systems have been proposed in the past decades, but most of them are dedicated to source-to-source transformations. This paper shows that parallelizing executable programs is feasible, even if they require complex transformations, and in effect decouples parallelization from compilation, for example, for closed-source or legacy software, where binary code is the only available representation. We propose an automatic parallelizer, which is able to perform advanced parallelization on binary code. It first parses the binary code and extracts high-level information. From this information, a C program is generated. This program captures only a subset of the program semantics, namely, loops and memory accesses. This C program is then parallelized using existing, state-of-the-art parallelizers, including advanced polyhedral parallelizers. The original program semantics is then re-injected, and the transformed parallel loop nests are recompiled by a standard C compiler. We show on the PolyBench benchmark suite that our system successfully detects and parallelizes almost all the loop nests from the binary code, using a recent polyhedral loop parallelizer as a backend. The paper ends by elaborating a strategy to parallelize more complex programs, such as those containing non-linear accesses to memory, and provides a few example case-studies.