Automated transformation for performance-critical kernels

  • Authors:
  • Qing Yi;R. Clint Whaley

  • Affiliations:
  • University of Texas at San Antonio;University of Texas at San Antonio

  • Venue:
  • LCSD '07 Proceedings of the 2007 Symposium on Library-Centric Software Design
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The performance of many scientific applications depends on a small number of key computational kernels which require a level of efficiency rarely satisfied by existing native compilers. We present a new approach to high performance kernel optimization, where a general-purpose transformation engine automates the production of highly efficient library routines. The library routines are then empirically tested until an implementation with a satisfactory performance level is found. Our framework requires an annotated kernel specification and can automatically produce optimized implementations based on tuning parameters controlled by a search driver. The transformation engine includes an extensive suite of optimizations which can be easily expanded using a custom transformation language. We have applied our framework to generate code for key linear algebra kernels and have achieved similar performance as that achieved by ATLAS's highly tuned kernels. In several cases, our kernels were faster than ATLAS's native kernels; we have made these kernels available to ATLAS, which results in speedups for the ATLAS library, as we show.