Dynamic parallelization and mapping of binary executables on hierarchical platforms

Authors:
Efe Yardimci;Michael Franz
Affiliations:
University of California, Irvine, Irvine, CA;University of California, Irvine, Irvine, CA
Venue:
Proceedings of the 3rd conference on Computing frontiers
Year:
2006

Citing 19
Cited 7

The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
DAISY: dynamic compilation for 100% architectural compatibility

Proceedings of the 24th annual international symposium on Computer architecture
Trace processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Multiscalar processors

25 years of the international symposia on Computer architecture (selected papers)
An evaluation of staged run-time optimizations in DyC

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
The Superthreaded Processor Architecture

IEEE Transactions on Computers
Dynamo: a transparent dynamic optimization system

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
High-level adaptive program optimization with ADAPT

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Using thread-level speculation to simplify manual parallelization

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Toward efficient and robust software speculative parallelization on multiprocessors

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
An Eight Issue Tree-VLIW Processor for Dynamic Binary Translation

ICCD '98 Proceedings of the International Conference on Computer Design
A Programmable Co-processor for Profiling

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Considerations in the Design of Hydra: A Multiprocessor-on-a-Chip Microarchitecture

Considerations in the Design of Hydra: A Multiprocessor-on-a-Chip Microarchitecture
Continuous program optimization

Continuous program optimization
IA-32 Execution Layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium®-based systems

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
ATOM: a system for building customized program analysis tools

ACM SIGPLAN Notices - Best of PLDI 1979-1999
An API for Runtime Code Patching

International Journal of High Performance Computing Applications
Hardware and software architectures for the CELL processor

CODES+ISSS '05 Proceedings of the 3rd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis

Mostly static program partitioning of binary executables

ACM Transactions on Programming Languages and Systems (TOPLAS)
Automatic Parallelization in a Binary Rewriter

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Polyhedral parallelization of binary code

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
ISAMAP: instruction mapping driven by dynamic binary translation

ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Limits of region-based dynamic binary parallelization

Proceedings of the 9th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Computational caches

Proceedings of the 6th International Systems and Storage Conference
ASC: automatically scalable computation

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

As performance improvements are being increasingly sought via coarse-grained parallelism, established expectations of continued sequential performance increases are not being met. Current trends in computing point towards platforms seeking performance improvements through various degrees of parallelism, with coarse-grained parallelism features becoming commonplace in even entry-level systems.Yet the broad variety of multiprocessor configurations that will be available that differ in the number of processing elements will make it difficult to statically create a single parallel version of a program that performs well on the whole range of such hardware. As a result, there will soon be a vast number of multiprocessor systems that are significantly under-utilized for lack of software that harnesses their power effectively. This problem is exacerbated by the growing inventory of legacy programs in binary executable form with possibly unreachable source code.We present a system that improves the performance of optimized sequential binaries through dynamic recompilation. Leveraging observations made at runtime, a thin software layer recompiles executing code compiled for a uniprocessor and generates parallelized and/or vectorized code segments that exploit available parallel resources. Among the techniques employed are control speculation, loop distribution across several threads, and automatic parallelization of recursive routines.Our solution is entirely software-based and can be ported to existing hardware platforms that have parallel processing capabilities. Our performance results are obtained on real hardware without using simulation.In preliminary benchmarks on only modestly parallel (2-way) hardware, our system already provides speedups of upto 40% on SpecCPU benchmarks, and near-optimal speedups on more obviously parallelizable benchmarks.