Lazy tree splitting

Authors:
Lars Bergstrom;Mike Rainey;John Reppy;Adam Shaw;Matthew Fluet
Affiliations:
University of Chicago, Chicago, IL, USA;University of Chicago, Chicago, IL, USA;University of Chicago, Chicago, IL, USA;University of Chicago, Chicago, IL, USA;Rochester Institute of Technology, Rochester, NY, USA
Venue:
Proceedings of the 15th ACM SIGPLAN international conference on Functional programming
Year:
2010

Citing 28
Cited 7

Simple generational garbage collection and fast allocation

Software—Practice & Experience
Vector models for data-parallel computing

Vector models for data-parallel computing
Compiling with continuations

Compiling with continuations
Compiling nested data-parallel programs for shared-memory multiprocessors

ACM Transactions on Programming Languages and Systems (TOPLAS)
Implementation of a portable nested data-parallel language

Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
A compile-time granularity analysis algorithm and its performance evaluation

Selected papers of international conference on Fifth generation computer systems 92
Ropes: an alternative to strings

Software—Practice & Experience
Programming parallel algorithms

Communications of the ACM
A provable time and space efficient implementation of NESL

Proceedings of the first ACM SIGPLAN international conference on Functional programming
The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Space-efficient scheduling of nested parallelism

ACM Transactions on Programming Languages and Systems (TOPLAS)
Scheduling multithreaded computations by work stealing

Journal of the ACM (JACM)
The Definition of Standard ML

The Definition of Standard ML
Implementation of multilisp: Lisp on a multiprocessor

LFP '84 Proceedings of the 1984 ACM Symposium on LISP and functional programming
Executing functional programs on a virtual tree of processors

FPCA '81 Proceedings of the 1981 conference on Functional programming languages and computer architecture
Algorithm + strategy = parallelism

Journal of Functional Programming
The Zipper

Journal of Functional Programming
Finger trees: a simple general-purpose data structure

Journal of Functional Programming
Whole-program compilation in MLton

Proceedings of the 2006 workshop on ML
Data parallel Haskell: a status report

Proceedings of the 2007 workshop on Declarative aspects of multicore programming
Manticore: a heterogeneous parallel language

Proceedings of the 2007 workshop on Declarative aspects of multicore programming
Status report: the manticore project

ML '07 Proceedings of the 2007 workshop on Workshop on ML
Clowns to the left of me, jokers to the right (pearl): dissecting data structures

Proceedings of the 35th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Implicitly-threaded parallelism in Manticore

Proceedings of the 13th ACM SIGPLAN international conference on Functional programming
A scheduling framework for general-purpose parallel languages

Proceedings of the 13th ACM SIGPLAN international conference on Functional programming
The Cilk++ concurrency platform

Proceedings of the 46th Annual Design Automation Conference
Lazy binary-splitting: a run-time adaptive work-stealing scheduler

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
On the granularity of divide-and-conquer parallelism

FP'95 Proceedings of the 1995 international conference on Functional Programming

Implicitly threaded parallelism in manticore

Journal of Functional Programming
Multicore garbage collection with local heaps

Proceedings of the international symposium on Memory management
Oracle scheduling: controlling granularity in implicitly parallel languages

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Vectorisation avoidance

Proceedings of the 2012 Haskell Symposium
Nested data-parallelism on the gpu

Proceedings of the 17th ACM SIGPLAN international conference on Functional programming
Data-only flattening for nested data parallelism

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
The manticore project

Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Nested data-parallelism (NDP) is a declarative style for programming irregular parallel applications. NDP languages provide language features favoring the NDP style, efficient compilation of NDP programs, and various common NDP operations like parallel maps, filters, and sum-like reductions. In this paper, we describe the implementation of NDP in Parallel ML (PML), part of the Manticore project. Managing the parallel decomposition of work is one of the main challenges of implementing NDP. If the decomposition creates too many small chunks of work, performance will be eroded by too much parallel overhead. If, on the other hand, there are too few large chunks of work, there will be too much sequential processing and processors will sit idle. Recently the technique of Lazy Binary Splitting was proposed for dynamic parallel decomposition of work on flat arrays, with promising results. We adapt Lazy Binary Splitting to parallel processing of binary trees, which we use to represent parallel arrays in PML. We call our technique Lazy Tree Splitting (LTS). One of its main advantages is its performance robustness: per-program tuning is not required to achieve good performance across varying platforms. We describe LTS-based implementations of standard NDP operations, and we present experimental data demonstrating the scalability of LTS across a range of benchmarks.