Scheduling directives: Accelerating shared-memory many-core processor execution

Authors:
Oded Green;Yitzhak Birk
Affiliations:
-;-
Venue:
Parallel Computing
Year:
2014

Citing 12
Cited 0

Scheduling Tasks with AND/OR Precedence Constraints

SIAM Journal on Computing
Dynamic Critical-Path Scheduling: An Effective Technique for Allocating Task Graphs to Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Introduction to algorithms

Introduction to algorithms
Practical Pram Programming

Practical Pram Programming
OpenMP: An Industry-Standard API for Shared-Memory Programming

IEEE Computational Science & Engineering
A Compile-Time Scheduling Heuristic for Interconnection-Constrained Heterogeneous Processor Architectures

IEEE Transactions on Parallel and Distributed Systems
Polynomial complete scheduling problems

SOSP '73 Proceedings of the fourth ACM symposium on Operating system principles
Computer Architecture, Fourth Edition: A Quantitative Approach

Computer Architecture, Fourth Edition: A Quantitative Approach
Task Scheduling for Parallel Systems (Wiley Series on Parallel and Distributed Computing)

Task Scheduling for Parallel Systems (Wiley Series on Parallel and Distributed Computing)
Fpga-based prototype of a pram-on-chip processor

Proceedings of the 5th conference on Computing frontiers
Scheduling Algorithms

Scheduling Algorithms
An efficient algorithm for exploiting multiple arithmetic units

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider many-core processors with a task-graph oriented programming model, whereby scheduling constraints among tasks are decided offline, and are then enforced by the runtime system using dedicated hardware. Here, exposing and beneficially exploiting fine grain data and control parallelism is increasingly important. Therefore, high expressive power for stating such constraints/directives, along with the ability to implement them in fast, simple hardware, is critical for success. In this paper, we focus on the relationship among different duplicable (multi-instance) tasks, which are used to express and exploit data parallelism. We extend the conventional Start-After-Complete (precedence) constraint to also be usable between replicas of different such tasks rather than only between entire tasks, thereby increasing the exposable parallelism. Additionally, we propose the parameterized Start-After-Start constraint, which can be used to control the degree of ''lockstep'' among multiple such tasks, e.g., in order to improve cache performance when the tasks work on the same data. Also, we briefly describe several additional interesting directives. Finally, we show that the directives can be supported efficiently in hardware. Hypercore, a very efficient CREW PRAM-like shared-cache architecture, which is very challenging because it has extremely fast dispatching for basic constraints, is used in the discussion. However, the new directives have broader applicability. Having shown the possibility of simple implementation and indications of benefit, this motivates further exploration of these directives and their implementation in hardware, as well as their support by programming tools.