A fast Fourier transform compiler
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Semicoarsening Multigrid on Distributed Memory Machines
SIAM Journal on Scientific Computing
Automatically tuned linear algebra software
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Experiences tuning SMG98: a semicoarsening multigrid benchmark based on the hypre library
ICS '02 Proceedings of the 16th international conference on Supercomputing
Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Effective, Automatic Procedure Extraction
IWPC '03 Proceedings of the 11th IEEE International Workshop on Program Comprehension
Using Information from Prior Runs to Improve Automated Tuning Systems
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Tuning High Performance Kernels through Empirical Compilation
ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
The Journal of Supercomputing
Ablego: a function outlining and partial inlining framework: Research Articles
Software—Practice & Experience
PEAK—a fast and effective performance tuning system via compiler optimization orchestration
ACM Transactions on Programming Languages and Systems (TOPLAS)
Extending Automatic Parallelization to Optimize High-Level Abstractions for Multicore
IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
A code isolator: isolating code fragments from large programs
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
AARTS: low overhead online adaptive auto-tuning
Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Auto-tuning full applications: A case study
International Journal of High Performance Computing Applications
Optimization strategies in different CUDA architectures using llCoMP
Microprocessors & Microsystems
A ROSE-Based OpenMP 3.0 research compiler supporting multiple runtime libraries
IWOMP'10 Proceedings of the 6th international conference on Beyond Loop Level Parallelism in OpenMP: accelerators, Tasking and more
Mitigating the compiler optimization phase-ordering problem using machine learning
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Portable section-level tuning of compiler parallelized applications
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Refactoring and automated performance tuning of computational chemistry application codes
Proceedings of the Winter Simulation Conference
Automated Insertion of Exception Handling for Key and Referential Constraints
Journal of Database Management
Tools for machine-learning-based empirical autotuning and specialization
International Journal of High Performance Computing Applications
Fine-grained Benchmark Subsetting for System Selection
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Hi-index | 0.00 |
Although automated empirical performance optimization and tuning is well-studied for kernels and domain-specific libraries, a current research grand challenge is how to extend these methodologies and tools to significantly larger sequential and parallel applications. In this context, we present the ROSE source-to-source outliner, which addresses the problem of extracting tunable kernels out of whole programs, thereby helping to convert the challenging whole-program tuning problem into a set of more manageable kernel tuning tasks. Our outliner aims to handle large scale C/C++, Fortran and OpenMP applications. A set of program analysis and transformation techniques are utilized to enhance the portability, scalability, and interoperability of source-to-source outlining. More importantly, the generated kernels preserve performance characteristics of tuning targets and can be easily handled by other tools. Preliminary evaluations have shown that the ROSE outliner serves as a key component within an end-to-end empirical optimization system and enables a wide range of sequential and parallel optimization opportunities.