A comparison of empirical and model-driven optimization
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Combining Models and Guided Empirical Search to Optimize for Multiple Levels of the Memory Hierarchy
Proceedings of the international symposium on Code generation and optimization
Predicting Unroll Factors Using Supervised Classification
Proceedings of the international symposium on Code generation and optimization
The science of deriving dense linear algebra algorithms
ACM Transactions on Mathematical Software (TOMS)
Tuning High Performance Kernels through Empirical Compilation
ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
Using Machine Learning to Focus Iterative Optimization
Proceedings of the International Symposium on Code Generation and Optimization
The Journal of Supercomputing
Automated transformation for performance-critical kernels
LCSD '07 Proceedings of the 2007 Symposium on Library-Centric Software Design
Using machine learning to improve automatic vectorization
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
POET: a scripting language for applying parameterized source-to-source program transformations
Software—Practice & Experience
Automatic restructuring of GPU kernels for exploiting inter-thread data locality
CC'12 Proceedings of the 21st international conference on Compiler Construction
AUGEM: automatically generate high performance dense linear algebra kernels on x86 CPUs
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Dense linear algebra kernels such as matrix multiplication have been used as benchmarks to evaluate the effectiveness of many automated compiler optimizations. However, few studies have looked at collectively applying the transformations and parameterizing them for external search. In this paper, we take a detailed look at the optimization space of three dense linear algebra kernels. We use a transformation scripting language (POET) to implement each kernel-level optimization as applied by ATLAS. We then extensively parameterize these optimizations from the perspective of a general-purpose compiler and use a stand-alone empirical search engine to explore the optimization space using several different search strategies. Our exploration of the search space reveals key interaction among several transformations that must be considered by compilers to approach the level of efficiency obtained through manual tuning of kernels.