Partial evaluation and automatic program generation
Partial evaluation and automatic program generation
C: a language for high-level, efficient, and machine-independent dynamic code generation
POPL '96 Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
DyC: an expressive annotation-directed dynamic compiler for C
Theoretical Computer Science - Partial evaluation and semantics-based program manipulation
Better tiling and array contraction for compiling scientific programs
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Combined Selection of Tile Sizes and Unroll Factors Using Iterative Compilation
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Transforming Complex Loop Nests for Locality
The Journal of Supercomputing
Combining Models and Guided Empirical Search to Optimize for Multiple Levels of the Memory Hierarchy
Proceedings of the international symposium on Code generation and optimization
Predicting Unroll Factors Using Supervised Classification
Proceedings of the international symposium on Code generation and optimization
The science of deriving dense linear algebra algorithms
ACM Transactions on Mathematical Software (TOMS)
Minimizing development and maintenance costs in supporting persistently optimized BLAS
Software—Practice & Experience - Research Articles
Tuning High Performance Kernels through Empirical Compilation
ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
Statistical Models for Empirical Search-Based Performance Tuning
International Journal of High Performance Computing Applications
Facilitating the search for compositions of program transformations
Proceedings of the 19th annual international conference on Supercomputing
A survey of strategies in rule-based program transformation systems
Journal of Symbolic Computation
A cache-conscious profitability model for empirical tuning of loop fusion
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
A language for the compact representation of multiple program versions
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
A practical method for quickly evaluating program optimizations
HiPEAC'05 Proceedings of the First international conference on High Performance Embedded Architectures and Compilers
Exploring the Optimization Space of Dense Linear Algebra Kernels
Languages and Compilers for Parallel Computing
PetaBricks: a language and compiler for algorithmic choice
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Exposing tunable parameters in multi-threaded numerical code
NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
Automated empirical tuning of scientific codes for performance and power consumption
Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Automated programmable control and parameterization of compiler optimizations
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
POET: a scripting language for applying parameterized source-to-source program transformations
Software—Practice & Experience
Hi-index | 0.00 |
The performance of many scientific applications depends on a small number of key computational kernels which require a level of efficiency rarely satisfied by existing native compilers. We present a new approach to high performance kernel optimization, where a general-purpose transformation engine automates the production of highly efficient library routines. The library routines are then empirically tested until an implementation with a satisfactory performance level is found. Our framework requires an annotated kernel specification and can automatically produce optimized implementations based on tuning parameters controlled by a search driver. The transformation engine includes an extensive suite of optimizations which can be easily expanded using a custom transformation language. We have applied our framework to generate code for key linear algebra kernels and have achieved similar performance as that achieved by ATLAS's highly tuned kernels. In several cases, our kernels were faster than ATLAS's native kernels; we have made these kernels available to ATLAS, which results in speedups for the ATLAS library, as we show.