Parallelizing complex scans and reductions

Authors:
Allan L. Fisher;Anwar M. Ghuloum
Affiliations:
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA;School of Computer Science, Carnegie Mellon University, Pittsburgh, PA
Venue:
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Year:
1994

Citing 19
Cited 33

Data parallel algorithms

Communications of the ACM - Special issue on parallelism
Direct parallelization of call statements

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Warp: an integrated solution of high-speed parallel computing

Proceedings of the 1988 ACM/IEEE conference on Supercomputing
Scans as Primitive Parallel Operations

IEEE Transactions on Computers
Automatic recognition of induction variables and recurrence relations by abstract interpretation

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Program optimization and parallelization using idioms

POPL '91 Proceedings of the 18th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
The Omega test: a fast and practical integer programming algorithm for dependence analysis

The Omega test: a fast and practical integer programming algorithm for dependence analysis
An introduction to parallel algorithms

An introduction to parallel algorithms
Exploiting task and data parallelism on a multicomputer

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
A unified semantic approach for the vectorization and parallelization of generalized reductions

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Deciding Linear Inequalities by Computing Loop Residues

Journal of the ACM (JACM)
Optimizing Supercompilers for Supercomputers

Optimizing Supercompilers for Supercomputers
Logic Minimization Algorithms for VLSI Synthesis

Logic Minimization Algorithms for VLSI Synthesis
Solving Linear Recurrences with Loop Raking

IPPS '92 Proceedings of the 6th International Parallel Processing Symposium
Recognizing and Parallelizing Bounded Recurrences

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Do&Merge: Integrating Parallel Loops and Reductions

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Detection of Recurrences in Sequential Programs with Loops

PARLE '93 Proceedings of the 5th International PARLE Conference on Parallel Architectures and Languages Europe
List Ranking and List Scan on the CRAY C-90

List Ranking and List Scan on the CRAY C-90
The complexity of parallel computations

The complexity of parallel computations

Flattening and parallelizing irregular, recurrent loop nests

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Commutativity analysis: a new analysis framework for parallelizing compilers

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Detection and global optimization of reduction operations for distributed parallel machines

ICS '96 Proceedings of the 10th international conference on Supercomputing
Deriving efficient parallel programs for complex recurrences

PASCO '97 Proceedings of the second international symposium on Parallel symbolic computation
Parallelization in calculational forms

POPL '98 Proceedings of the 25th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Commutativity analysis: a new analysis technique for parallelizing compilers

ACM Transactions on Programming Languages and Systems (TOPLAS)
The role of associativity and commutativity in the detection and transformation of loop-level parallelism

ICS '98 Proceedings of the 12th international conference on Supercomputing
An Interleaving Transformation for Parallelizing Reductions for Distributed-Memory Parallel Machines

The Journal of Supercomputing
Matching and searching analysis for parallel hardware implementation on FPGAs

FPGA '01 Proceedings of the 2001 ACM/SIGDA ninth international symposium on Field programmable gate arrays
Parallel Solutions of Simple Indexed Recurrence Equations

IEEE Transactions on Parallel and Distributed Systems
High-level Language Support for User-defined Reductions

The Journal of Supercomputing
Eliminating synchronization bottlenecks using adaptive replication

ACM Transactions on Programming Languages and Systems (TOPLAS)
Fusion of Concurrent Invocations of Exclusive Methods

PaCT '01 Proceedings of the 6th International Conference on Parallel Computing Technologies
Analysis of Multithreaded Programs

SAS '01 Proceedings of the 8th International Symposium on Static Analysis
Commutativity Analysis: A Technique for Automatically Parallelizing Pointer-Based Computations

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Compiler Optimization of Implicit Reductions for Distributed Memory Multiprocessors

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Derivation of a logarithmic time carry lookahead addition circuit

Journal of Functional Programming
Global-view abstractions for user-defined reductions and scans

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Towards automatic parallelization of tree reductions in dynamic programming

Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Automatic inversion generates divide-and-conquer parallel programs

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Computation-efficient parallel prefix

AIC'06 Proceedings of the 6th WSEAS International Conference on Applied Informatics and Communications
XARK: An extensible framework for automatic recognition of computational kernels

ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallel prefix algorithms on the multicomputer

WSEAS Transactions on Computer Research
Optimizing the parallel computation of linear recurrences using compact matrix representations

Journal of Parallel and Distributed Computing
New parallel prefix algorithms

AIC'09 Proceedings of the 9th WSEAS international conference on Applied informatics and communications
New families of computation-efficient parallel prefix algorithms

WSEAS Transactions on Computers
Speculative parallelization of partial reduction variables

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Automatic parallelization via matrix multiplication

Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Automatic parallelization using the value evolution graph

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
Automatic parallelization of recursive functions using quantifier elimination

FLOPS'10 Proceedings of the 10th international conference on Functional and Logic Programming
Scan detection and parallelization in "inherently sequential" nested loop programs

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Generate, test, and aggregate: a calculation-based framework for systematic parallel programming with mapreduce

ESOP'12 Proceedings of the 21st European conference on Programming Languages and Systems
A Generate-Test-Aggregate parallel programming library for systematic parallel programming

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a method for automatically extracting parallel prefix programs from sequential loops, even in the presence of complicated conditional statements. Rather than searching for associative operators in the loop body directly, the method rests on the observation that functional composition itself is associative. Accordingly, we model the loop body as a multivalued function of multiple parameters, and look for a closed-form representation of arbitrary compositions of loop body instances. Careful analysis of conditionals allows this search to succeed in cases where existing automatic methods fail. The method has been implemented and used to generate code for the iWarp parallel computer.