Compiler and Run-Time Support for Exploiting Regularity within Irregular Applications

Authors:
Antonio Lain;Dhruva R. Chakrabarti;Prithviraj Banerjee
Affiliations:
Hewlett Packard, Bristol, UK;Northwestern Univ., Evanston, IL;Northwestern Univ., Evanston, IL
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2000

Citing 27
Cited 2

Run-time scheduling and execution of loops on message passing machines

Journal of Parallel and Distributed Computing - Special issue: algorithms for hypercube computers
Parallel computational fluid dynamics: implementations and results

Parallel computational fluid dynamics: implementations and results
The high performance Fortran handbook

The high performance Fortran handbook
Automatic data partitioning on distributed memory multicomputers

Automatic data partitioning on distributed memory multicomputers
Runtime compilation techniques for data partitioning and communication schedule reuse

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Compiler and runtime support for structured and block structured applications

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
GIVE-N-TAKE—a balanced code placement framework

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Techniques to overlap computation and communication in irregular iterative applications

ICS '94 Proceedings of the 8th international conference on Supercomputing
Extending high performance Fortran for the support of unstructured computations

ICS '95 Proceedings of the 9th international conference on Supercomputing
Architecture-independent locality-improving transformations of computational graphs embedded in k-dimensions

ICS '95 Proceedings of the 9th international conference on Supercomputing
Compiler support for hybrid irregular accesses on multicomputers

ICS '96 Proceedings of the 10th international conference on Supercomputing
Compiler and run-time support for irregular computations

Compiler and run-time support for irregular computations
Compiler support for machine-independent parallelization of irregular problems

Compiler support for machine-independent parallelization of irregular problems
Improving cache performance in dynamic applications through data and computation reorganization at run time

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Improving memory hierarchy performance for irregular applications

ICS '99 Proceedings of the 13th international conference on Supercomputing
Global optimization by suppression of partial redundancies

Communications of the ACM
Loop Transformations for Restructuring Compilers: The Foundations

Loop Transformations for Restructuring Compilers: The Foundations
Parallel Computers Two: Architecture, Programming and Algorithms

Parallel Computers Two: Architecture, Programming and Algorithms
Solving Linear Systems on Vector and Shared Memory Computers

Solving Linear Systems on Vector and Shared Memory Computers
The Paradigm Compiler for Distributed-Memory Multicomputers

Computer
Exploiting spatial regularity in irregular iterative applications

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Partitioning regular grid applications with irregular boundaries for cache-coherent multiprocessors

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Compiler Analysis for Irregular Problems in Fortran D

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Slicing Analysis and Indirect Accesses to Distributed Arrays

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
The program dependence graph in a software development environment

SDE 1 Proceedings of the first ACM SIGSOFT/SIGPLAN software engineering symposium on Practical software development environments
A Survey of Program Slicing Techniques.

A Survey of Program Slicing Techniques.
Memory Hierarchy Management for Iterative Graph Structures

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium

Symbolic Communication Set Generation for Irregular Parallel Applications

The Journal of Supercomputing
A symmetry-based formalism for array subtyping

APL '00 Proceedings of the international conference on APL-Berlin-2000 conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper starts from a well-known idea, that structure in irregular problems improves sequential performance, and tries to show that the same structure can also be exploited for parallelization of irregular problems on a distributed-memory multicomputer. In particular, we extend a well-known parallelization technique called run-time compilation to use structure information that is explicit on the array subscripts. This paper presents a number of internal representations suited to particular access patterns and shows how various preprocessing structures such as translation tables, trace arrays, and interprocessor communication schedules can be encoded in terms of one or more of these representations. We show how loop and index normalization are important for detection of irregularity in array references, as well as the presence of locality in such references. This paper presents methods for detection of irregularity, feasibility of inspection, and finally, placement of inspectors and interprocessor communication schedules. We show that this process can be automated through extensions to an HPF/Fortran-77 distributed-memory compiler (PARADIGM) and a new run-time support for irregular problems (PILAR) that uses a variety of internal representations of communication patterns. We devise performance measures which consider the relationship between the inspection cost, the execution cost, and the number of times the executor is invoked so that a comparison of the competing schemes can be performed independent of the number of iterations. Finally, we show experimental results on an IBM SP-2 that validate our approach. These results show that dramatic improvements in both memory requirements and execution time can be achieved by using these techniques.