Parameterized micro-benchmarking: an auto-tuning approach for complex applications

  • Authors:
  • Wenjing Ma;Sriram Krishnamoorthy;Gagan Agrawal

  • Affiliations:
  • Pacific Northwest National Laboratory, Richland, WA, WA, USA;Pacific Northwest National Laboratory, Richland, WA, WA, USA;The Ohio State University, Columbus, OH, USA

  • Venue:
  • Proceedings of the 9th conference on Computing Frontiers
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Auto-tuning has emerged as an important practical method for creating highly optimized implementations of key computational kernels and applications. However, the growing complexity of architectures and applications is creating new challenges for auto-tuning. Complex applications can involve a prohibitively large search space that precludes empirical auto-tuning. Similarly, architectures are getting more complicated, making it hard to model performance. In this paper, we focus on the challenge to auto-tuning presented by applications with a large number of kernels and kernel instantiations. While these kernels may share a somewhat similar pattern, they differ considerably in problem sizes and the exact computation performed. We propose and evaluate a new approach to auto-tuning which we refer to as parameterized micro-benchmarking. It is an alternative to the two existing classes of approaches to auto-tuning: analytical model-based and empirical search-based. Particularly, we argue that the former may not be able to capture all the architectural features that impact performance, whereas the latter might be too expensive for an application that has several different kernels. In our approach, different expressions in the application, different possible implementations of each expression, and the key architectural features, are used to derive a simple micro-benchmark and a small parameter space. We have evaluated our approach in the context of GPU implementations of tensor contraction expressions.