Space-limited procedures: a methodology for portable high-performance

Authors:
B. Alpern;L. Carter;J. Ferrante
Affiliations:
-;-;-
Venue:
PMMP '95 Proceedings of the conference on Programming Models for Massively Parallel Computers
Year:
1995

Citing 0
Cited 8

Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology

ICS '97 Proceedings of the 11th international conference on Supercomputing
Architecture-cognizant divide and conquer algorithms

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Transforming loops to recursion for multi-level memory hierarchies

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Sequoia: programming the memory hierarchy

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
The memory behavior of cache oblivious stencil computations

The Journal of Supercomputing
Programming the memory hierarchy revisited: supporting irregular parallelism in sequoia

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
CUDA-level performance with python-level productivity for Gaussian mixture model applications

HotPar'11 Proceedings of the 3rd USENIX conference on Hot topic in parallelism

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents the generic program approach to achieving portable high-performance. This approach has three phases. In the first, a generic program, defining a family of semantically-equivalent program variants, is written. In the second, the generic program as specialized to the variant that performs best on an abstract model of the target computer. In the third, this variant is translated to run on the target computer. The Parallel Memory Hierarchy (PMH) generic model is used to define the abstract models of target computers. Using this approach, a spectrum of solutions is possible. At one end of the spectrum, a simple generic program can be written, with roughly the same difficulty as writing a sequential program, that can be tuned automatically to achieve reasonably good performance on a wide variety of computers. This solution can be refined to give better performance. At the labor-intensive end of the spectrum, an application can be tuned so that it achieves the best possible performance on each of a collection of computers.