Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Direct methods for sparse matrices
Direct methods for sparse matrices
Scans as Primitive Parallel Operations
IEEE Transactions on Computers
Relaxing SIMD control flow constraints using loop transformations
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Slicing analysis and indirect accesses to distributed arrays
Slicing analysis and indirect accesses to distributed arrays
Implementation of a portable nested data-parallel language
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Match and move: an approach to data parallel computing
Match and move: an approach to data parallel computing
Parallelizing complex scans and reductions
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Optimizing Supercompilers for Supercomputers
Optimizing Supercompilers for Supercomputers
Communication and memory requirements as the basis for mapping task and data parallel programs
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Solving Linear Recurrences with Loop Raking
IPPS '92 Proceedings of the 6th International Parallel Processing Symposium
Compiler Analysis for Irregular Problems in Fortran D
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Implementing the Multiprefix Operation on Parallel and Vector Computers
Implementing the Multiprefix Operation on Parallel and Vector Computers
Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors
Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors
Commutativity analysis: a new analysis framework for parallelizing compilers
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Detection and global optimization of reduction operations for distributed parallel machines
ICS '96 Proceedings of the 10th international conference on Supercomputing
Deriving efficient parallel programs for complex recurrences
PASCO '97 Proceedings of the second international symposium on Parallel symbolic computation
Commutativity analysis: a new analysis technique for parallelizing compilers
ACM Transactions on Programming Languages and Systems (TOPLAS)
Eliminating synchronization bottlenecks in object-based programs using adaptive replication
ICS '99 Proceedings of the 13th international conference on Supercomputing
An Interleaving Transformation for Parallelizing Reductions for Distributed-Memory Parallel Machines
The Journal of Supercomputing
Eliminating synchronization bottlenecks using adaptive replication
ACM Transactions on Programming Languages and Systems (TOPLAS)
Analysis of Multithreaded Programs
SAS '01 Proceedings of the 8th International Symposium on Static Analysis
Compiler Optimization of Implicit Reductions for Distributed Memory Multiprocessors
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Automatic parallelization using the value evolution graph
LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Hi-index | 0.00 |
Irregular loop nests in which the loop bounds are determined dynamically by indexed arrays are difficult to compile into expressive parallel constructs, such as segmented scans and reductions. In this paper, we describe a suite of transformations to automatically parallelize such irregular loop nests, even in the presence of recurrences. We describe a simple, general loop flattening transformation, along with new optimizations which make it a viable compiler transformation. A robust recurrence parallelization technique is coupled to the loop flattening transformation, allowing parallelization of segmented reductions, scans, and combining-sends over arbitrary associative operators. We discuss the implementation and performance results of the transformations in a parallelizing Fortran 77 compiler for the Cray C90 supercomputer. In particular, we focus on important sparse matrix-vector multiplication kernels, for one of which we are able to automatically derive an algorithm used by one of the fastest library routines available.