Model-guided autotuning of high-productivity languages for petascale computing

  • Authors:
  • Hans Zima;Mary Hall;Chun Chen;Jaqueline Chame

  • Affiliations:
  • JPL, Pasadena, CA, USA;University of Utah, Salt lake City, USA;University of Utah, Salt Lake City, USA;ISI, Marina del Rey, USA

  • Venue:
  • Proceedings of the 18th ACM international symposium on High performance distributed computing
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

addresses the enormous complexity of mapping applications to current and future highly parallel platforms - including scalable architectures consisting of tens of thousands of nodes, many-core devices with tens to hundreds of cores, and hierarchical systems providing multi-level parallelism. At systems of these scales, for many important algorithms, performance is dominated by the time required to move data across the levels of the memory hierarchy. As a consequence, locality awareness of algorithms and the efficient management of communication are essential requirements for obtaining scalable parallel performance, and are of particular concern for applications characterized by irregular memory access patterns. We describe the design of a programming system that focuses on productivity of application programmers in expressing locality-aware algorithms for high-end architectures, which are then automatically tuned for performance. The approach combines the successes of two novel concepts for managing locality: high-level specification of user-defined data distributions and model-guided autotuning for data locality. The resulting combined system provides a powerful general mechanism for the specification of data distributions, which can express domain-specific knowledge, and facilitates automatic tuning of a distribution to access patterns in algorithms and its application to different levels of a memory hierarchy. Because there is a clean separation between the specification of a data distribution and the algorithms in which it is used, these can be written separately and composed together to quickly develop new applications that can be tuned in the context of their data set and execution environment. We address key issues for a range of codes that include LU Decomposition, Sparse Matrix-Vector Multiply and Knowledge Discovery. The knowledge discovery algorithms, in particular, stress the proposed language and compiler technology and provide a forcing function for developing tools that address inherent challenges of irregular applications.}