Scheduling precedence graphs in systems with interprocessor communication times
SIAM Journal on Computing
DSC: Scheduling Parallel Tasks on an Unbounded Number of Processors
IEEE Transactions on Parallel and Distributed Systems
Program Execution Control for Communication on the Fly in Dynamic Shared Memory Processor Clusters
PARELEC '02 Proceedings of the International Conference on Parallel Computing in Electrical Engineering
Program Graph Scheduling for Dynamic SMP Clusters with Communication on the Fly
ISPDC '04 Proceedings of the Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks
ISPDC'03 Proceedings of the Second international conference on Parallel and distributed computing
Parallel matrix multiplication based on dynamic SMP clusters in SoC technology
ISPA'07 Proceedings of the 2007 international conference on Frontiers of High Performance Computing and Networking
Hi-index | 0.00 |
The paper presents an algorithm for scheduling parallel tasks in a parallel architecture based on multiple dynamic SMP clusters, in which processors can be switched between shared memory modules at runtime. Memory modules and processors are organized in computational System–on–Chip (SoC) modules of a fixed size and are inter–connected by a local communication network implemented in a Network–on–Chip technology (NoC). Processors located in the same SoC module can communicate using data transfers on the fly. A number of such SoC modules can be connected using a global interconnection network to form a larger infrastructure. The presented algorithm schedules initial macro dataflow program graphs for such an architecture with a given number of SoC modules, assuming a fixed size of a module. First, it distributes program graph nodes between processors. Then it transforms and schedules computations and communication to use processor switching and read on the fly facilities. Finally, it divides the whole set of processors into subsets of a given size, which then are mapped to separate SoC modules.