Hierarchical scheduling of DAG structured computations on manycore processors with dynamic thread grouping

Authors:
Yinglong Xia;Viktor K. Prasanna;James Li
Affiliations:
Department of Computer Science, University of Southern California, Los Angeles, CA;Department of Computer Science, University of Southern California, Los Angeles, CA and Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, CA;Ming Hsieh Department of Electrical Engineering, University of Southern California, Los Angeles, CA
Venue:
JSSPP'10 Proceedings of the 15th international conference on Job scheduling strategies for parallel processing
Year:
2010

Citing 14
Cited 0

Towards an architecture-independent analysis of parallel algorithms

STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
Static scheduling algorithms for allocating directed task graphs to multiprocessors

ACM Computing Surveys (CSUR)
A hierarchical load-balancing framework for dynamic multithreaded computations

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Computers and Intractability; A Guide to the Theory of NP-Completeness

Computers and Intractability; A Guide to the Theory of NP-Completeness
Analysis, evaluation, and comparison of algorithms for scheduling task graphs on parallel processors

ISPAN '96 Proceedings of the 1996 International Symposium on Parallel Architectures, Algorithms and Networks
Cilk: An Efficient Multithreaded Runtime System

Cilk: An Efficient Multithreaded Runtime System
MPI Microtask for programming the cell broadband engineTM processor

IBM Systems Journal
Contention awareness and fault-tolerant scheduling for precedence constrained tasks in heterogeneous systems

Parallel Computing
Parallel Evidence Propagation on Multicore Processors

PaCT '09 Proceedings of the 10th International Conference on Parallel Computing Technologies
Dynamic task scheduling for linear algebra algorithms on distributed-memory multicore systems

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Scheduling multiple DAGs onto heterogeneous systems

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Analysis and performance results of computing betweenness centrality on IBM Cyclops64

The Journal of Supercomputing
Exploring financial applications on many-core-on-a-chip architecture: a first experiment

ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
High-Performance algorithm engineering for large-scale graph problems and computational biology

WEA'05 Proceedings of the 4th international conference on Experimental and Efficient Algorithms

Quantified Score

Hi-index	0.01

Visualization

Abstract

Many computational solutions can be expressed as directed acyclic graphs (DAGs) with weighted nodes. In parallel computing, scheduling such DAGs onto manycore processors remains a fundamental challenge, since synchronization across dozens of threads and preserving precedence constraints can dramatically degrade the performance. In order to improve scheduling performance on manycore processors, we propose a hierarchical scheduling method with dynamic thread grouping, which schedules DAG structured computations at three different levels. At the top level, a supermanager separates threads into groups, each consisting of a manager thread and several worker threads. The supermanager dynamically merges and partitions the groups to adapt the scheduler to the input task dependency graphs. Through group merging and partitioning, the proposed scheduler can dynamically adjust to become a centralized scheduler, a distributed scheduler or somewhere in between, depending on the input graph. At the group level, managers collaboratively schedule tasks for their workers. At the within-group level, workers perform self-scheduling within their respective groups and execute tasks. We evaluate the proposed scheduler on the Sun UltraSPARC T2 (Niagara 2) platform that supports up to 64 hardware threads. With respect to various input task dependency graphs, the proposed scheduler exhibits superior performance when compared with other various baseline methods, including typical centralized and distributed schedulers.