Finding effective optimization phase sequences
Proceedings of the 2003 ACM SIGPLAN conference on Language, compiler, and tool for embedded systems
Configuring Large High-Performance Clusters at Lightspeed: A Case Study
International Journal of High Performance Computing Applications
Algebraic analysis of high-pass quantization
ACM Transactions on Graphics (TOG)
Optimizing symmetric dense matrix-vector multiplication on GPUs
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Vectorization past dependent branches through speculation
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Hi-index | 0.00 |
Using traditional methodologies and tools, the problem of keeping performance-critical kernels at high efficiency on hardware evolving at the incredible rates dictated by Moore's Law is almost intractable. On product lines where ISA compatibility is maintained through several generations of architecture, the growing gap between the machine that the software sees and the actual hardware exacerbates this problem considerably, as do the evolving software layers between the application in question and the ISA. To address this problem, we have utilized a relatively new technique, which we call AEOS (Automated Empirical Optimization of Software). In this paper, we describe the AEOS systems we have researched, implemented and tested. The first of these is ATLAS (Automatically Tuned Linear Algebra Software), which empirically optimizes key linear algebra kernels to arbitrary cache-based machines. Our latest research effort is instantiated in the iFKO (iterative Floating Point Kernel Optimizer) project, whose aim is to perform empirical optimization of relatively arbitrary kernels using a low-level iterative and empirical compilation framework.