DAGuE: A Generic Distributed DAG Engine for High Performance Computing

Authors:
George Bosilca;Aurelien Bouteiller;Anthony Danalis;Thomas Herault;Pierre Lemarinier;Jack Dongarra
Affiliations:
-;-;-;-;-;-
Venue:
IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
Year:
2011

Citing 0
Cited 5

High performance matrix inversion based on LU factorization for multicore architectures

Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers
From serial loops to parallel execution on distributed systems

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Turbine: a distributed-memory dataflow engine for extreme-scale many-task applications

Proceedings of the 1st ACM SIGMOD Workshop on Scalable Workflow Execution Engines and Technologies
Hierarchical QR factorization algorithms for multi-core clusters

Parallel Computing
Turbine: A Distributed-memory Dataflow Engine for High Performance Many-task Applications

Fundamenta Informaticae - Scalable Workflow Enactment Engines and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

The frenetic development of the current architectures places a strain on the current state-of-the-art programming environments. Harnessing the full potential of such architectures has been a tremendous task for the whole scientific computing community. We present DAGuE a generic framework for architecture aware scheduling and management of micro-tasks on distributed many-core heterogeneous architectures. Applications we consider can be represented as a Direct Acyclic Graph of tasks with labeled edges designating data dependencies. DAGs are represented in a compact, problem-size independent format that can be queried on-demand to discover data dependencies, in a totally distributed fashion. DAGuE assigns computation threads to the cores, overlaps communications and computations and uses a dynamic, fully-distributed scheduler based on cache awareness, data-locality and task priority. We demonstrate the efficiency of our approach, using several micro-benchmarks to analyze the performance of different components of the framework, and a Linear Algebra factorization as a use case.