Partitioning and scheduling loops on NOWs

Authors:
S. Chen;J. Xue
Affiliations:
School of Computer Science and Engineering, The University of New South Wales, Sydney 2052, Australia;School of Computer Science and Engineering, The University of New South Wales, Sydney 2052, Australia
Venue:
Computer Communications
Year:
1999

Citing 27
Cited 2

Performance analysis of local computer networks

Performance analysis of local computer networks
Theory of linear and integer programming

Theory of linear and integer programming
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers

IEEE Transactions on Computers
Linda in context

Communications of the ACM
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Ultracomputers: a teraflop before its time

Communications of the ACM
A dynamic scheduling method for irregular parallel programs

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Using processor affinity in loop scheduling on shared-memory multiprocessors

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
(Pen)-ultimate tiling?

Integration, the VLSI Journal
Using MPI: portable parallel programming with the message-passing interface

Using MPI: portable parallel programming with the message-passing interface
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing

PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
Optimal tile size adjustment in compiling general DOACROSS loop nests

ICS '95 Proceedings of the 9th international conference on Supercomputing
Parallel execution of iterative computations on workstation clusters

Journal of Parallel and Distributed Computing
Communication-minimal tiling of uniform dependence loops

Journal of Parallel and Distributed Computing
Determining the idle time of a tiling

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Compile-time minimisation of load imbalance in loop nests

ICS '97 Proceedings of the 11th international conference on Supercomputing
Reuse-driven tiling for improving data locality

International Journal of Parallel Programming
Statistical Models in S

Statistical Models in S
Myrinet: A Gigabit-per-Second Local Area Network

IEEE Micro
Using the Memory Channel Network

IEEE Micro
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
Impact of memory hierarchy on program partitioning and scheduling

HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
Loop scheduling for heterogeneity

HPDC '95 Proceedings of the 4th IEEE International Symposium on High Performance Distributed Computing
Customized Dynamic Load Balancing for a Network of Workstations

Customized Dynamic Load Balancing for a Network of Workstations
Automatic Blocking of Nested Loops

Automatic Blocking of Nested Loops
Determining the Idle Time of a Tiling: New Results

Determining the Idle Time of a Tiling: New Results
A high-performance end system architecture for real-time CORBA

IEEE Communications Magazine

Studying the impact of synchronization frequency on scheduling tasks with dependencies in heterogeneous systems

Performance Evaluation
Towards the optimal synchronization granularity for dynamic scheduling of pipelined computations on heterogeneous computing systems

Concurrency and Computation: Practice & Experience

Quantified Score

Hi-index	0.24

Visualization

Abstract

This paper addresses the problem of partitioning and scheduling loops for a network of heterogeneous workstations. By isolating the effects of send and receive and quantifying the impact of network contention on the overall communication cost, a simple yet accurate cost model for predicting the communication overhead for a pair of workstations is presented. The processing capacities of all workstations in a network are modeled based on their CPU speeds and memory sizes. Based on these models, loop tiling is used extensively to partition and schedule loops across the workstations. By adjusting sizes, i.e. the granularities of tasks, the impact of the heterogeneity arising from program, processor and network is minimised. Experimental results on an Ethernet of seven DEC workstations demonstrate the effectiveness of our models and parallelisation strategies.