Flattening and parallelizing irregular, recurrent loop nests

Authors:
Anwar M. Ghuloum;Allan L. Fisher
Affiliations:
School of Computer Science, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA;School of Computer Science, Carnegie Mellon University, 5000 Forbes Ave., Pittsburgh, PA
Venue:
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Year:
1995

Citing 14
Cited 10

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Direct methods for sparse matrices

Direct methods for sparse matrices
Scans as Primitive Parallel Operations

IEEE Transactions on Computers
Relaxing SIMD control flow constraints using loop transformations

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Slicing analysis and indirect accesses to distributed arrays

Slicing analysis and indirect accesses to distributed arrays
Implementation of a portable nested data-parallel language

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Match and move: an approach to data parallel computing

Match and move: an approach to data parallel computing
Parallelizing complex scans and reductions

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Optimizing Supercompilers for Supercomputers

Optimizing Supercompilers for Supercomputers
Communication and memory requirements as the basis for mapping task and data parallel programs

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Solving Linear Recurrences with Loop Raking

IPPS '92 Proceedings of the 6th International Parallel Processing Symposium
Compiler Analysis for Irregular Problems in Fortran D

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Implementing the Multiprefix Operation on Parallel and Vector Computers

Implementing the Multiprefix Operation on Parallel and Vector Computers
Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors

Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors

Commutativity analysis: a new analysis framework for parallelizing compilers

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Detection and global optimization of reduction operations for distributed parallel machines

ICS '96 Proceedings of the 10th international conference on Supercomputing
Deriving efficient parallel programs for complex recurrences

PASCO '97 Proceedings of the second international symposium on Parallel symbolic computation
Commutativity analysis: a new analysis technique for parallelizing compilers

ACM Transactions on Programming Languages and Systems (TOPLAS)
Eliminating synchronization bottlenecks in object-based programs using adaptive replication

ICS '99 Proceedings of the 13th international conference on Supercomputing
An Interleaving Transformation for Parallelizing Reductions for Distributed-Memory Parallel Machines

The Journal of Supercomputing
Eliminating synchronization bottlenecks using adaptive replication

ACM Transactions on Programming Languages and Systems (TOPLAS)
Analysis of Multithreaded Programs

SAS '01 Proceedings of the 8th International Symposium on Static Analysis
Compiler Optimization of Implicit Reductions for Distributed Memory Multiprocessors

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Automatic parallelization using the value evolution graph

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Irregular loop nests in which the loop bounds are determined dynamically by indexed arrays are difficult to compile into expressive parallel constructs, such as segmented scans and reductions. In this paper, we describe a suite of transformations to automatically parallelize such irregular loop nests, even in the presence of recurrences. We describe a simple, general loop flattening transformation, along with new optimizations which make it a viable compiler transformation. A robust recurrence parallelization technique is coupled to the loop flattening transformation, allowing parallelization of segmented reductions, scans, and combining-sends over arbitrary associative operators. We discuss the implementation and performance results of the transformations in a parallelizing Fortran 77 compiler for the Cray C90 supercomputer. In particular, we focus on important sparse matrix-vector multiplication kernels, for one of which we are able to automatically derive an algorithm used by one of the fastest library routines available.