Parametric multi-level tiling of imperfectly nested loops
Proceedings of the 23rd international conference on Supercomputing
Model-guided autotuning of high-productivity languages for petascale computing
Proceedings of the 18th ACM international symposium on High performance distributed computing
Speeding up Nek5000 with autotuning and specialization
Proceedings of the 24th ACM International Conference on Supercomputing
Combined Iterative and Model-driven Optimization in an Automatic Parallelization Framework
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Loop transformations: convexity, pruning and optimization
Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering
A programming language interface to describe transformations and code generation
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Parallel Low-Storage Runge-Kutta Solvers for ODE Systems with Limited Access Distance
International Journal of High Performance Computing Applications
Automatic performance debugging of SPMD-style parallel programs
Journal of Parallel and Distributed Computing
AARTS: low overhead online adaptive auto-tuning
Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Probabilistic auto-tuning for architectures with complex constraints
Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Auto-tuning full applications: A case study
International Journal of High Performance Computing Applications
Journal of Computational and Applied Mathematics
Enhancing locality for recursive traversals of recursive structures
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Using machine learning to improve automatic vectorization
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Journal of Parallel and Distributed Computing
Loop transformation recipes for code generation and auto-tuning
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Language and compiler support for auto-tuning variable-accuracy algorithms
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Extendable pattern-oriented optimization directives
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Predictive modeling in a polyhedral optimization space
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Parameterized micro-benchmarking: an auto-tuning approach for complex applications
Proceedings of the 9th conference on Computing Frontiers
POET: a scripting language for applying parameterized source-to-source program transformations
Software—Practice & Experience
Auto-tuning for energy usage in scientific applications
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Panacea: towards holistic optimization of MapReduce applications
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Analytical bounds for optimal tile size selection
CC'12 Proceedings of the 21st international conference on Compiler Construction
Extendable pattern-oriented optimization directives
ACM Transactions on Architecture and Code Optimization (TACO)
Portable section-level tuning of compiler parallelized applications
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A multi-objective auto-tuning framework for parallel codes
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A script-based autotuning compiler system to generate high-performance CUDA code
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
ACM Transactions on Computing Education (TOCE)
AutoTune: a plugin-driven approach to the automatic tuning of parallel applications
PARA'12 Proceedings of the 11th international conference on Applied Parallel and Scientific Computing
Adaptive Mapping and Parameter Selection Scheme to Improve Automatic Code Generation for GPUs
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
ACM Transactions on Architecture and Code Optimization (TACO)
On Expressing Strategies for Directive-Driven Multicore Programing Models
Proceedings of Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms
Controlling a complete hardware synthesis toolchain with LARA aspects
Microprocessors & Microsystems
Hi-index | 0.00 |
We describe a scalable and general-purpose framework for auto-tuning compiler-generated code. We combine Active Harmony's parallel search backend with the CHiLL compiler transformation framework to generate in parallel a set of alternative implementations of computation kernels and automatically select the one with the best-performing implementation. The resulting system achieves performance of compiler-generated code comparable to the fully automated version of the ATLAS library for the tested kernels. Performance for various kernels is 1.4 to 3.6 times faster than the native Intel compiler without search. Our search algorithm simultaneously evaluates different combinations of compiler optimizations and converges to solutions in only a few tens of search-steps.