Lazy tree splitting

  • Authors:
  • Lars Bergstrom;Mike Rainey;John Reppy;Adam Shaw;Matthew Fluet

  • Affiliations:
  • University of Chicago, Chicago, IL, USA;University of Chicago, Chicago, IL, USA;University of Chicago, Chicago, IL, USA;University of Chicago, Chicago, IL, USA;Rochester Institute of Technology, Rochester, NY, USA

  • Venue:
  • Proceedings of the 15th ACM SIGPLAN international conference on Functional programming
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Nested data-parallelism (NDP) is a declarative style for programming irregular parallel applications. NDP languages provide language features favoring the NDP style, efficient compilation of NDP programs, and various common NDP operations like parallel maps, filters, and sum-like reductions. In this paper, we describe the implementation of NDP in Parallel ML (PML), part of the Manticore project. Managing the parallel decomposition of work is one of the main challenges of implementing NDP. If the decomposition creates too many small chunks of work, performance will be eroded by too much parallel overhead. If, on the other hand, there are too few large chunks of work, there will be too much sequential processing and processors will sit idle. Recently the technique of Lazy Binary Splitting was proposed for dynamic parallel decomposition of work on flat arrays, with promising results. We adapt Lazy Binary Splitting to parallel processing of binary trees, which we use to represent parallel arrays in PML. We call our technique Lazy Tree Splitting (LTS). One of its main advantages is its performance robustness: per-program tuning is not required to achieve good performance across varying platforms. We describe LTS-based implementations of standard NDP operations, and we present experimental data demonstrating the scalability of LTS across a range of benchmarks.