Oracle scheduling: controlling granularity in implicitly parallel languages

  • Authors:
  • Umut A. Acar;Arthur Charguéraud;Mike Rainey

  • Affiliations:
  • Max Planck Institute for Software Systems, Kaiserslautern, Germany;Max Planck Institute for Software Systems, Kaiserslautern, Germany;Max Planck Institute for Software Systems, Kaiserslautern, Germany

  • Venue:
  • Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

A classic problem in parallel computing is determining whether to execute a task in parallel or sequentially. If small tasks are executed in parallel, the task-creation overheads can be overwhelming. If large tasks are executed sequentially, processors may spin idle. This granularity problem, however well known, is not well understood: broadly applicable solutions remain elusive. We propose techniques for controlling granularity in implicitly parallel programming languages. Using a cost semantics for a general-purpose language in the style of the lambda calculus with support for parallelism, we show that task-creation overheads can indeed slow down parallel execution by a multiplicative factor. We then propose oracle scheduling, a technique for reducing these overheads, which bases granularity decisions on estimates of task-execution times. We prove that, for a class of computations, oracle scheduling can reduce task creation overheads to a small fraction of the work without adversely affecting available parallelism, thereby leading to efficient parallel executions. We realize oracle scheduling in practice by a combination of static and dynamic techniques. We require the programmer to provide the asymptotic complexity of every function and use run-time profiling to determine the implicit, architecture-specific constant factors. In our experiments, we were able to reduce overheads of parallelism down to between 3 and 13 percent, while achieving 6- to 10-fold speedups.