Practical loop transformations for tensor contraction expressions on multi-level memory hierarchies

  • Authors:
  • Wenjing Ma;Sriram Krishnamoorthy;Gagan Agrawal

  • Affiliations:
  • The Ohio State University, Columbus, OH;Pacific Northwest National Lab, Richland, WA;The Ohio State University, Columbus, OH

  • Venue:
  • CC'11/ETAPS'11 Proceedings of the 20th international conference on Compiler construction: part of the joint European conferences on theory and practice of software
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Modern architectures are characterized by deeper levels of memory hierarchy, often explicitly addressable. Optimizing applications for such architectures requires careful management of the data movement across all these levels. In this paper, we focus on the problem of mapping tensor contractions to memory hierarchies with more than two levels, specifically addressing placement of memory allocation and data movement statements, choice of loop fusions, and tile size selection. Existing algorithms to find an integrated solution to this problem even for two-level memory hierarchies have been shown to be expensive. We improve upon this work by focusing on the first-order cost components, simplifying the analysis required and reducing the number of candidates to be evaluated. We have evaluated our framework on a cluster of GPUs. Using five candidate tensor contraction expressions, we show that fusion at multiple levels improves performance, and our framework is effective in determining profitable transformations.