Model-guided autotuning of high-productivity languages for petascale computing

Authors:
Hans Zima;Mary Hall;Chun Chen;Jaqueline Chame
Affiliations:
JPL, Pasadena, CA, USA;University of Utah, Salt lake City, USA;University of Utah, Salt Lake City, USA;ISI, Marina del Rey, USA
Venue:
Proceedings of the 18th ACM international symposium on High performance distributed computing
Year:
2009

Citing 21
Cited 2

A Partitioning Strategy for Nonuniform Problems on Multiprocessors

IEEE Transactions on Computers
Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology

ICS '97 Proceedings of the 11th international conference on Supercomputing
Vienna-Fortran/HPF Extensions for Sparse and Irregular Problems and Their Compilation

IEEE Transactions on Parallel and Distributed Systems
High-level management of communication schedules in HPF-like languages

ICS '98 Proceedings of the 12th international conference on Supercomputing
High performance Fortran: history, status and future

Parallel Computing - Special issues on languages and compilers for parallel computers
Co-array Fortran for parallel programming

ACM SIGPLAN Fortran Forum
A fast Fourier transform compiler

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Using MPI (2nd ed.): portable parallel programming with the message-passing interface

Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Compiling high performance Fortran for distributed-memory architectures

Parallel Computing - Special Anniversary issue
A performance analysis of the Berkeley UPC compiler

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Combining Models and Guided Empirical Search to Optimize for Multiple Levels of the Memory Hierarchy

Proceedings of the international symposium on Code generation and optimization
Facilitating the search for compositions of program transformations

Proceedings of the 19th annual international conference on Supercomputing
Processing-in-memory technology for knowledge discovery algorithms

DaMoN '06 Proceedings of the 2nd international workshop on Data management on new hardware
The rise and fall of High Performance Fortran: an historical object lesson

Proceedings of the third ACM SIGPLAN conference on History of programming languages
Parallel Programmability and the Chapel Language

International Journal of High Performance Computing Applications
Model-guided empirical optimization for memory hierarchy

Model-guided empirical optimization for memory hierarchy
Introduction to the cell broadband engine architecture

IBM Journal of Research and Development
From FORTRAN 77 to locality-aware high productivity languages for peta-scale computing

Scientific Programming - Fortran Programming Language and Scientific Programming: 50 Years of Mutual Growth
Programming in Vienna Fortran

Scientific Programming
A scalable auto-tuning framework for compiler optimization

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
A systematic approach to model-guided empirical search for memory hierarchy optimization

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing

A programming language interface to describe transformations and code generation

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Probabilistic auto-tuning for architectures with complex constraints

Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era

Quantified Score

Hi-index	0.01

Visualization

Abstract

addresses the enormous complexity of mapping applications to current and future highly parallel platforms - including scalable architectures consisting of tens of thousands of nodes, many-core devices with tens to hundreds of cores, and hierarchical systems providing multi-level parallelism. At systems of these scales, for many important algorithms, performance is dominated by the time required to move data across the levels of the memory hierarchy. As a consequence, locality awareness of algorithms and the efficient management of communication are essential requirements for obtaining scalable parallel performance, and are of particular concern for applications characterized by irregular memory access patterns. We describe the design of a programming system that focuses on productivity of application programmers in expressing locality-aware algorithms for high-end architectures, which are then automatically tuned for performance. The approach combines the successes of two novel concepts for managing locality: high-level specification of user-defined data distributions and model-guided autotuning for data locality. The resulting combined system provides a powerful general mechanism for the specification of data distributions, which can express domain-specific knowledge, and facilitates automatic tuning of a distribution to access patterns in algorithms and its application to different levels of a memory hierarchy. Because there is a clean separation between the specification of a data distribution and the algorithms in which it is used, these can be written separately and composed together to quickly develop new applications that can be tuned in the context of their data set and execution environment. We address key issues for a range of codes that include LU Decomposition, Sparse Matrix-Vector Multiply and Knowledge Discovery. The knowledge discovery algorithms, in particular, stress the proposed language and compiler technology and provide a forcing function for developing tools that address inherent challenges of irregular applications.}