Scheduling task graphs for execution in dynamic SMP clusters with bounded number of resources

Authors:
Lukasz Masko
Affiliations:
Institute of Computer Science of the Polish Academy of Sciences, Warsaw, Poland
Venue:
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Year:
2005

Citing 5
Cited 1

Scheduling precedence graphs in systems with interprocessor communication times

SIAM Journal on Computing
DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors

IEEE Transactions on Parallel and Distributed Systems
Program Execution Control for Communication on the Fly in Dynamic Shared Memory Processor Clusters

PARELEC '02 Proceedings of the International Conference on Parallel Computing in Electrical Engineering
Program Graph Scheduling for Dynamic SMP Clusters with Communication on the Fly

ISPDC '04 Proceedings of the Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks
Atomic operations for task scheduling for systems based on communication on-the-fly between SMP clusters

ISPDC'03 Proceedings of the Second international conference on Parallel and distributed computing

Parallel matrix multiplication based on dynamic SMP clusters in SoC technology

ISPA'07 Proceedings of the 2007 international conference on Frontiers of High Performance Computing and Networking

Quantified Score

Hi-index	0.00

Visualization

Abstract

The paper presents an algorithm for scheduling parallel tasks in a parallel architecture based on multiple dynamic SMP clusters, in which processors can be switched between shared memory modules at runtime. Memory modules and processors are organized in computational System–on–Chip (SoC) modules of a fixed size and are inter–connected by a local communication network implemented in a Network–on–Chip technology (NoC). Processors located in the same SoC module can communicate using data transfers on the fly. A number of such SoC modules can be connected using a global interconnection network to form a larger infrastructure. The presented algorithm schedules initial macro dataflow program graphs for such an architecture with a given number of SoC modules, assuming a fixed size of a module. First, it distributes program graph nodes between processors. Then it transforms and schedules computations and communication to use processor switching and read on the fly facilities. Finally, it divides the whole set of processors into subsets of a given size, which then are mapped to separate SoC modules.