Automatic creation of tile size selection models

  • Authors:
  • Tomofumi Yuki;Lakshminarayanan Renganarayanan;Sanjay Rajopadhye;Charles Anderson;Alexandre E. Eichenberger;Kevin O'Brien

  • Affiliations:
  • Colorado State University, Fort Collins, CO, USA;IBM T.J. Watson Research Center, Yorktown Heights, NY, USA;Colorado State University, Fort Collins, CO, USA;Colorado State University, Fort Collins, CO, USA;IBM T.J. Watson Research Center, Yorktown Heights, NY, USA;IBM T.J. Watson Research Center, Yorktown Heights, NY, USA

  • Venue:
  • Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Tiling is a widely used loop transformation for exposing/exploiting parallelism and data locality. Effective use of tiling requires selection and tuning of the tile sizes. This is usually achieved by hand-crafting tile size selection (TSS) models that characterize the performance of the tiled program as a function of tile sizes. The best tile sizes are selected by either directly using the TSS model or by using the TSS model together with an empirical search. Hand-crafting accurate TSS models is hard, and adapting them to different architecture/compiler, or even keeping them up-to-date with respect to the evolution of a single compiler is often just as hard. Instead of hand-crafting TSS models, can we automatically learn or create them? In this paper, we show that for a specific class of programs fairly accurate TSS models can be automatically created by using a combination of simple program features, synthetic kernels, and standard machine learning techniques. The automatic TSS model generation scheme can also be directly used for adapting the model and/or keeping it up-to-date. We evaluate our scheme on six different architecture-compiler combinations (chosen from three different architectures and four different compilers). The models learned by our method have consistently shown near-optimal performance (within 5% of the optimal on average) across all architecture-compiler combinations.