The Paradigm Compiler for Distributed-Memory Multicomputers

Authors:
Prithviraj Banerjee;John A. Chandy;Manish Gupta;Eugene W. Hodges IV;John G. Holm;Antonio Lain;Daniel J. Palermo;Shankar Ramaswamy;Ernesto Su
Affiliations:
-;-;-;-;-;-;-;-;-
Venue:
Computer
Year:
1995

Citing 7
Cited 64

Scanning polyhedra with DO loops

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Compiling Fortran D for MIMD distributed-memory machines

Communications of the ACM
The high performance Fortran handbook

The high performance Fortran handbook
PARADIGM: a compiler for automatic data distribution on multicomputers

ICS '93 Proceedings of the 7th international conference on Supercomputing
Runtime compilation techniques for data partitioning and communication schedule reuse

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Exploiting spatial regularity in irregular iterative applications

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Processor Tagged Descriptors: A Data Structure for Compiling for Distributed-Memory Multicomputers

PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques

An HPF compiler for the IBM SP2

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
A flexible operation execution model for shared distributed objects

Proceedings of the 11th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Compiler support for hybrid irregular accesses on multicomputers

ICS '96 Proceedings of the 10th international conference on Supercomputing
Automating parallel runtime optimizations using post-mortem analysis

ICS '96 Proceedings of the 10th international conference on Supercomputing
Integrating task and data parallelism using shared objects

ICS '96 Proceedings of the 10th international conference on Supercomputing
A Framework for Exploiting Task and Data Parallelism on Distributed Memory Multicomputers

IEEE Transactions on Parallel and Distributed Systems
Using integer sets for data-parallel program analysis and optimization

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
An efficient uniform run-time scheme for mixed regular-irregular applications

ICS '98 Proceedings of the 12th international conference on Supercomputing
Support for Efficient Programming on the SB-PRAM

International Journal of Parallel Programming
A task- and data-parallel programming language based on shared objects

ACM Transactions on Programming Languages and Systems (TOPLAS)
A coordination language for mixed task and and data parallel programs

Proceedings of the 1999 ACM symposium on Applied computing
A global communication optimization technique based on data-flow analysis and linear algebra

ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiler and Run-Time Support for Exploiting Regularity within Irregular Applications

IEEE Transactions on Parallel and Distributed Systems
A Transformation Approach to Derive Efficient Parallel Implementations

IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools parallel processing
Deriving Array Distributions by Optimization Techniques

The Journal of Supercomputing
Accurately Selecting Block Size at Runtime in Pipelined Parallel Programs

International Journal of Parallel Programming
Minimizing Data and Synchronization Costs in One-Way Communication

IEEE Transactions on Parallel and Distributed Systems
A comparative study of the NAS MG benchmark across parallel languages and architectures

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Data and memory optimization techniques for embedded systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Global optimization techniques for automatic parallelization of hybrid applications

ICS '01 Proceedings of the 15th international conference on Supercomputing
Static Single Assignment Form for Message-Passing Programs

International Journal of Parallel Programming
Compiler optimization of dynamic data distributions for distributed-memory multicomputers

Compiler optimizations for scalable parallel systems
A framework for global communication analysis of optimizations

Compiler optimizations for scalable parallel systems
Advanced code generation for high performance Fortran

Compiler optimizations for scalable parallel systems
Integer lattice based methods for local address generation for block-cyclic distributions

Compiler optimizations for scalable parallel systems
High performance Fortran compilation techniques for parallelizing scientific codes

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
DAME: An Environment for Preserving the Efficiency of Data-Parallel Computations on Distributed Systems

IEEE Parallel & Distributed Technology: Systems & Technology
Approaches for Integrating Task and Data Parallelism

IEEE Concurrency
Maximizing Multiprocessor Performance with the SUIF Compiler

Computer
A Parallelization Domain Oriented Multilevel Graph Partitioner

IEEE Transactions on Computers
Algorithms for Supporting Compiled Communication

IEEE Transactions on Parallel and Distributed Systems
Exploiting task and data parallelism in parallel Hough and Radon transforms

ICPP '97 Proceedings of the international Conference on Parallel Processing
Compiling Several Classes of Communication Patterns on a Multithreaded Architecture

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
On Privatization of Variables for Data-Parallel Execution

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Exploiting Ownership Sets in HPF

LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
Compiler Synthesis of Task Graphs for Parallel Program Performance Prediction

LCPC '00 Proceedings of the 13th International Workshop on Languages and Compilers for Parallel Computing-Revised Papers
CC--MPI: a compiled communication capable MPI prototype for ethernet switched clusters

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimization of Data Distribution and Processor Allocation Problem Using Simulated Annealing

The Journal of Supercomputing
Compiler-assisted generation of error-detecting parallel programs

FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Load Balancing HPF programs by Migrating Virtual Processors

HIPS '97 Proceedings of the 1997 Workshop on High-Level Programming Models and Supportive Environments (HIPS '97)
Efficient support for pipelining in software distributed shared memory systems

Real-time system security
Linear data distribution based on index analysis

High performance scientific and engineering computing
Quasidynamic Layout Optimizations for Improving Data Locality

IEEE Transactions on Parallel and Distributed Systems
Flexible Hardware/Software Support for Message Passing on a Distributed Shared Memory Architecture

Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Object-Distribution Analysis for Program Decomposition and Re-Clustering

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 13 - Volume 14
An MPI prototype for compiled communication on Ethernet switched clusters

Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
Compiler-directed channel allocation for saving power in on-chip networks

Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
VFC: The Vienna Fortran Compiler

Scientific Programming
Memetic algorithms for parallel code optimization

International Journal of Parallel Programming
Energy-Efficient Multiprocessor Systems-on-Chip for Embedded Computing: Exploring Programming Models and Their Architectural Support

IEEE Transactions on Computers
Reducing fine-grain communication overhead in multithread code generation for heterogeneous MPSoC

SCOPES '07 Proceedingsof the 10th international workshop on Software & compilers for embedded systems
Simulink®-based heterogeneous multiprocessor SoC design flow for mixed hardware/software refinement and simulation

Integration, the VLSI Journal
A translation system for enabling data mining applications on GPUs

Proceedings of the 23rd international conference on Supercomputing
Slicing based code parallelization for minimizing inter-processor communication

CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Scalable computing with parallel tasks

Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers
A modular and extensible macroprogramming compiler

Proceedings of the 2010 ICSE Workshop on Software Engineering for Sensor Network Applications
Experiences in using cetus for source-to-source transformations

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Low power engineering

Embedded Systems Design
Hirundo: a mechanism for automated production of optimized data stream graphs

ICPE '12 Proceedings of the 3rd ACM/SPEC International Conference on Performance Engineering
Finding, expressing and managing parallelism in programs executed on clusters of workstations

Computer Communications
SEParAT: scheduling support environment for parallel application task graphs

Cluster Computing
Compiling affine loop nests for distributed-memory parallel architectures

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Automatic optimization of stream programs via source program operator graph transformations

Distributed and Parallel Databases
Combined scheduling and mapping for scalable computing with parallel tasks

Scientific Programming - Biological Knowledge Discovery and Data Mining

Quantified Score

Hi-index	4.11

Visualization

Abstract

The authors describe a flexible compiler framework for distributed-memory multicomputers, called Paradigm (Parallelizing Compiler for Distributed-Memory, General-Purpose Multicomputers). To extract computational power from a multicomputer, users must often expend significant time and energy to write efficient software. Paradigm addresses this problem by automatically parallelizing sequential programs. Besides handling traditional compiler optimizations, Paradigm focuses on several other areas within a unified platform. These include automatic data distribution, communication optimizations, support for irregular computations, exploitation of functional and data parallelism, and multithreaded execution. Automatic data partitioning involves several choices. These include array alignment, distribution (block or cyclic), block size, and mesh configuration. Paradigm addresses these decisions in distinct phases. The compiler supports both regular and irregular computations. For regular computations, the compiler uses efficient processor-tagged descriptors to handle the simplest and most frequent cases. It uses more general, inequality-based representations for the difficult cases. This lets Paradigm compile a larger proportion of programs without jeopardizing compilation speed. In addition, to reduce the overhead caused by frequent communications, the compiler employs message coalescing, message verification, message aggregation, and coarse-grain pipelining. For irregular computations, Paradigm uses two sequences of code: an inspector for preprocessing and an executor for performing the actual computations. The Parallel Irregular Library with Application of Regularity (PILAR) provides Paradigm's irregular runtime support. Finally, the authors describe how Paradigm uses functional and data parallelism and multithreading to improve overall execution efficiency.