Work-efficient nested data-parallelism

Authors:
D. W. Palmer;J. F. Prins;S. Westfold
Affiliations:
-;-;-
Venue:
FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)
Year:
1995

Citing 0
Cited 6

More types for nested data parallel programming

ICFP '00 Proceedings of the fifth ACM SIGPLAN international conference on Functional programming
Data parallel Haskell: a status report

Proceedings of the 2007 workshop on Declarative aspects of multicore programming
Nested data-parallelism on the gpu

Proceedings of the 17th ACM SIGPLAN international conference on Functional programming
Work efficient higher-order vectorisation

Proceedings of the 17th ACM SIGPLAN international conference on Functional programming
Data-only flattening for nested data parallelism

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Towards a streaming model for nested data parallelism

Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

An apply-to-all construct is the key mechanism for expressing data-parallelism, but data-parallel programming languages like HPF and C* significantly restrict which operations can appear in the construct. Allowing arbitrary operations substantially simplifies the expression of irregular and nested data-parallel computations. The technique of flattening nested parallelism introduced by Blelloch, compiles data-parallel programs with unrestricted apply-to-all constructs into vector operations, and has achieved notable success, particularly with irregular data-parallel programs. However, these programs must be carefully constructed so that flattening them does not lead to suboptimal work complexity due to unnecessary replication in index operations. We present new flattening transformations that generate programs with correct work complexity. Because these transformations may introduce concurrent reads in parallel indexing, we developed a randomized indexing that reduces concurrent reads while maintaining work-efficiency. Experimental results show that the new rules and implementations significantly reduce memory usage and improve performance.