Run-Time Parallelization and Scheduling of Loops
IEEE Transactions on Computers
ICS '94 Proceedings of the 8th international conference on Supercomputing
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Optimizing ML with run-time code generation
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Fast, effective dynamic compilation
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
VCODE: a retargetable, extensible, very fast dynamic code generation system
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
A general approach for run-time specialization and its application to C
POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Dynamic feedback: an effective technique for adaptive computing
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Does “just in time” = “better late than never”?
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Efficient incremental run-time specialization for free
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
An evaluation of staged run-time optimizations in DyC
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
C and tcc: a language and compiler for dynamic code generation
ACM Transactions on Programming Languages and Systems (TOPLAS)
A framework for remote dynamic program optimization
DYNAMO '00 Proceedings of the ACM SIGPLAN workshop on Dynamic and adaptive compilation and optimization
Automatically tuned linear algebra software
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Parallel Programming with Polaris
Computer
Adaptive loop transformations for scientific programs
SPDP '95 Proceedings of the 7th IEEE Symposium on Parallel and Distributeed Processing
High-level adaptive program optimization with ADAPT
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Embedded Processor Design Challenges: Systems, Architectures, Modeling, and Simulation - SAMOS
Cache Models for Iterative Compilation
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Embedded processor design challenges
Rating Compiler Optimizations for Automatic Performance Tuning
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Adaptive execution techniques for SMT multiprocessor architectures
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Statistical Models for Empirical Search-Based Performance Tuning
International Journal of High Performance Computing Applications
Iterative compilation for energy reduction
Journal of Embedded Computing - Cache exploitation in embedded systems
PEAK—a fast and effective performance tuning system via compiler optimization orchestration
ACM Transactions on Programming Languages and Systems (TOPLAS)
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Quick and Practical Run-Time Evaluation of Multiple Program Optimizations
Transactions on High-Performance Embedded Architectures and Compilers I
Mostly static program partitioning of binary executables
ACM Transactions on Programming Languages and Systems (TOPLAS)
PetaBricks: a language and compiler for algorithmic choice
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Adaptive execution techniques of parallel programs for multiprocessors
Journal of Parallel and Distributed Computing
Collective optimization: A practical collaborative approach
ACM Transactions on Architecture and Code Optimization (TACO)
Parallelism orchestration using DoPE: the degree of parallelism executive
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
A practical method for quickly evaluating program optimizations
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Adaptively increasing performance and scalability of automatically parallelized programs
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Iterative optimization for the data center
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Parcae: a system for flexible parallel execution
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Siblingrivalry: online autotuning through local competitions
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Palirria: Accurate On-line Parallelism Estimation for Adaptive Work-Stealing
Proceedings of Programming Models and Applications on Multicores and Manycores
Hi-index | 0.00 |
Dynamic program optimization offers performance improvements far beyond those possible with traditional compile-time optimization [1, 2, 3, 4]. These gains are due to the ability to exploit both architectural and input data set characteristics that are unknown prior to execution time. In this paper, we propose a novel framework for dynamic program optimization, ADAPT (Automated De-coupled Adaptive Program Transformation), that builds on the strengths of existing approaches. The key to our framework is the decoupling of the dynamic compilation of new code variants from the dynamic selection of these variants at their points of use. This allows code generation to occur concurrently with program execution, removing dynamic compilation overheads from the critical path. We present a compilation system, based on the Polaris optimizing compiler [5] that automatically applies this framework to general 驴plugged-in驴 optimization techniques. We evaluate our system on three programs from the SPEC floating point benchmark suite by dynamically applying loop distribution, loop unrolling, loop tiling and automatic parallelization. We show that our techniques can improve performance by as much as 70% over statically optimized code.