Data-Driven Tasks and Their Implementation

  • Authors:
  • Sagnak Tasirlar;Vivek Sarkar

  • Affiliations:
  • -;-

  • Venue:
  • ICPP '11 Proceedings of the 2011 International Conference on Parallel Processing
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Dynamic task parallelism has been identified as a prerequisite for improving productivity and performance on future many-core processors. In dynamic task parallelism, computations are created dynamically and the runtime scheduler is responsible for scheduling the computations across processor cores. The sets of task graphs that can be supported by a dynamic scheduler depend on the underlying task primitives in the parallel programming model, with various classes of fork-join structures used most often in practice. However, many researchers have advocated the benefits of more general task graph structures, and have shown that the use of these task graph structures can lead to improved performance. In this paper, we propose an extension to task parallelism called Data-Driven Tasks (DDTs) that can be used to create arbitrary task graph structures. Unlike a normal task that starts execution upon creation, a DDT specifies its input constraints in an await clause containing a list of Data-Driven Futures (DDFs). A DDF can be viewed as a container with a full/empty state that obeys a dynamic single-assignment rule. The runtime scheduler will then ensure that a task is only scheduled when all the DDFs in its await clause become available (full). There is no constraint on which task performs a put() operation on a DDF. We describe five scheduling algorithms (Coarse-Grain Blocking, Fine-Grain Blocking, Delayed Async, Rollback & Replay, Data-Driven) that can be used to implement DDTs, and include performance evaluations of these five algorithms on a variety of benchmark programs and multi-core platforms. Our results show that the Data-Driven scheduler is the best approach for implementing DDTs, both from the viewpoints of memory efficiency and scalable parallelism.