Global optimization techniques for automatic parallelization of hybrid applications

Authors:
Dhruva R. Chakrabarti;Prithviraj Banerjee
Affiliations:
Internet and IA64 Foundation Lab, Hewlett-Packard Company, CA and Center for Parallel and Distributed Computing, ECE Dept., Northwestern University, 2145 Sheridan Road, Evanston, IL;Center for Parallel and Distributed Computing, ECE Dept., Northwestern University, 2145 Sheridan Road, Evanston, IL
Venue:
ICS '01 Proceedings of the 15th international conference on Supercomputing
Year:
2001

Citing 23
Cited 0

Detecting equality of variables in programs

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Global value numbers and redundant computations

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Efficiently computing static single assignment form and the control dependence graph

ACM Transactions on Programming Languages and Systems (TOPLAS)
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Communication optimization and code generation for distributed memory machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Introduction to parallel computing: design and analysis of algorithms

Introduction to parallel computing: design and analysis of algorithms
Array-data flow analysis and its use in array privatization

POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
GIVE-N-TAKE—a balanced code placement framework

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Global code motion/global value numbering

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Global communication analysis and optimization

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
VCODE: a retargetable, extensible, very fast dynamic code generation system

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Compiler and run-time support for irregular computations

Compiler and run-time support for irregular computations
A Unified Framework for Optimizing Communication in Data-Parallel Programs

IEEE Transactions on Parallel and Distributed Systems
Array SSA form and its use in parallelization

POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
An efficient uniform run-time scheme for mixed regular-irregular applications

ICS '98 Proceedings of the 12th international conference on Supercomputing
An evaluation of staged run-time optimizations in DyC

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Constant propagation with conditional branches

POPL '85 Proceedings of the 12th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Static Single Assignment Form for Message-Passing Programs

International Journal of Parallel Programming
Code motion of control structures in high-level languages

POPL '86 Proceedings of the 13th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
The Paradigm Compiler for Distributed-Memory Multicomputers

Computer
An Implementation of Interprocedural Bounded Regular Section Analysis

IEEE Transactions on Parallel and Distributed Systems
Combining dependence and data-flow analyses to optimize communication

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a novel technique to perform global optimization of communication and preprocessing calls in the presence of array accesses with arbitrary subscripts. Our scheme is presented in the context of automatic parallelization of sequential programs to produce message passing programs for execution on distributed machines. We use the static single assignment (SSA) form for message passing programs as the intermediate representation and then present techniques to perform global optimizations even in the presence of array accesses with arbitrary subscripts. The focus of this paper is in showing that, using a uniform compilation method both at compile-time and at run-time, our framework is able to determine the earliest and the latest legal communication point for a certain distributed array reference even in the presence of arbitrary array addressing functions. Our scheme then heuristically determines the final communication point after considering the interaction between the relevant communication schedules. Owing to combined static and dynamic analysis, a quasi-dynamic method of code generation is implemented. We describe the need for proper interaction between the compiler and the run-time routines for efficient implementation of optimizations as well as for compatible code generation. All of the analyses is initiated at compile-time, static analyses of the program is done as much as possible, and then the run-time routines take over the analyses while building on the data structures initiated at compile time. This scheme has been incorporated in our compiler framework which can use uniform methods to compile, parallelize, and optimize a sequential program irrespective of the subscripts used in array addressing functions. Experimental results for a number of benchmarks on an IBM SP-2 show up to around 10-25% reduction in total run-times in our globally-optimized schemes compared to other state-of-the-art schemes on 16 processors.