Principles of runtime support for parallel processors

Authors:
R. Mirchandaney;J. H. Saltz;R. M. Smith;D. M. Nico;K. Crowley
Affiliations:
Yale Univ., New Haven, CT;Yale Univ., New Haven, CT;Yale Univ., New Haven, CT;College of William and Mary, Williamsburg, VA;Yale Univ., New Haven, CT
Venue:
ICS '88 Proceedings of the 2nd international conference on Supercomputing
Year:
1988

Citing 5
Cited 33

Advanced compiler optimizations for supercomputers

Communications of the ACM - Special issue on parallelism
Linda and Friends

Computer
Para-Functional Programming

Computer
Portable programs for parallel processors

Portable programs for parallel processors
Run-time partitioning of scientific continuum calculations running on multiprocessors

Run-time partitioning of scientific continuum calculations running on multiprocessors

Run-time parallelization and scheduling of loops

SPAA '89 Proceedings of the first annual ACM symposium on Parallel algorithms and architectures
Run-Time Parallelization and Scheduling of Loops

IEEE Transactions on Computers
Compiler optimizations for Fortran D on MIMD distributed-memory machines

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Compiling Fortran D for MIMD distributed-memory machines

Communications of the ACM
Access normalization: loop restructuring for NUMA compilers

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Implementation of a parallel unstructured Euler solver on shared and distributed memory architectures

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Heterogeneous parallel programming in Jade

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Generating local addresses and communication sets for data-parallel programs

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Loop transformations for NUMA machines

ACM SIGPLAN Notices - Workshop on languages, compilers and run-time environments for distributed memory multiprocessors
Access normalization: loop restructuring for NUMA computers

ACM Transactions on Computer Systems (TOCS)
Runtime compilation techniques for data partitioning and communication schedule reuse

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Data and program restructuring of irregular applications for cache-coherent multiprocessor

ICS '94 Proceedings of the 8th international conference on Supercomputing
Static and Run-Time Algorithms for All-to-Many Personalized Communication on Permutation Networks

IEEE Transactions on Parallel and Distributed Systems
Compiler transformations for high-performance computing

ACM Computing Surveys (CSUR)
Index array flattening through program transformation

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Efficient resolution of sparse indirections in data-parallel compilers

ICS '95 Proceedings of the 9th international conference on Supercomputing
Portable run-time support for dynamic object-oriented parallel processing

ACM Transactions on Computer Systems (TOCS)
Experimental evaluation of efficient sparse matrix distributions

ICS '96 Proceedings of the 10th international conference on Supercomputing
An efficient uniform run-time scheme for mixed regular-irregular applications

ICS '98 Proceedings of the 12th international conference on Supercomputing
Static Single Assignment Form for Message-Passing Programs

International Journal of Parallel Programming
Runtime and compiler support for irregular computations

Compiler optimizations for scalable parallel systems
Run-time and compile-time support for adaptive irregular problems

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Scheduling of unstructured communication on the Intel iPSC/860

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Parallelizing Molecular Dynamics Programs for Distributed-Memory Machines

IEEE Computational Science & Engineering
Distributed Memory Compiler Design For Sparse Problems

IEEE Transactions on Computers
Compiling Global Name-Space Parallel Loops for Distributed Execution

IEEE Transactions on Parallel and Distributed Systems
Runtime Support and Compilation Methods for User-Specified Irregular Data Distributions

IEEE Transactions on Parallel and Distributed Systems
The design and implementation of a parallel array operator for the arbitrary remapping of data

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Metrics and models for reordering transformations

MSP '04 Proceedings of the 2004 workshop on Memory system performance
Sparse Tiling for Stationary Iterative Methods

International Journal of High Performance Computing Applications
Optimizing irregular shared-memory applications for clusters

Proceedings of the 22nd annual international conference on Supercomputing
Region-based parallelization of irregular reductions on explicitly managed memory hierarchies

The Journal of Supercomputing
Non-affine Extensions to Polyhedral Code Generation

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.02

Visualization

Abstract

There exists substantial data level parallelism in scientific problems. The PARTY runtime system is an attempt to obtain efficient parallel implementations for scientific computations, particularly those where the data dependencies are manifest only at runtime. This can preclude compiler based detection of certain types of parallelism. The automated system is structured as follows: An appropriate level of granularity is first selected for the computations. A directed acyclic graph representation of the program is generated on which various aggregation techniques may be employed in order to generate efficient schedules. These schedules are then mapped onto the target machine. We describe some initial results from experiments conducted on the Intel Hypercube and the Encore Multimax that indicate the usefulness of our approach.