Possibilities of Optimal Execution of Parallel Programs Containing Simple and Iterated Loops on Heterogeneous Parallel Computational Systems with Distributed Memory

Authors:
A. I. Avetisyan;S. S. Gaisaryan;O. I. Samovarov
Affiliations:
Institute of System Programming, Russian Academy of Sciences, ul. Bol'shaya Kommunisticheskaya 25, Moscow, 109004 Russia arut@ispras.ru;Institute of System Programming, Russian Academy of Sciences, ul. Bol'shaya Kommunisticheskaya 25, Moscow, 109004 Russia ssg@ispras.ru;Institute of System Programming, Russian Academy of Sciences, ul. Bol'shaya Kommunisticheskaya 25, Moscow, 109004 Russia samov@ispras.ru
Venue:
Programming and Computing Software
Year:
2002

Citing 6
Cited 1

Introduction to Parallel & Vector Solution of Linear Systems

Introduction to Parallel & Vector Solution of Linear Systems
Corpus-based static branch prediction

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Dynamic Critical-Path Scheduling: An Effective Technique for Allocating Task Graphs to Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
A template for non-uniform parallel loops based on dynamic scheduling and prefetching techniques

ICS '96 Proceedings of the 10th international conference on Supercomputing
A taxonomy of scheduling in general-purpose distributed computing systems

IEEE Transactions on Software Engineering
On the Stability of a Distributed Dynamic Load Balancing Algorithm

ICPADS '98 Proceedings of the 1998 International Conference on Parallel and Distributed Systems

Supporting schedules of resource co-allocation for distributed computing in scalable systems

Programming and Computing Software

Quantified Score

Hi-index	0.01

Visualization

Abstract

The problem of load balancing when executing parallel programs on computational systems with distributed memory is currently of great interest. The most general statement of this problem is that for one parallel loop: execution of a heterogeneous loop on a heterogeneous computational system. When stated in this way, the problem is NP-complete even in the case of two nodes, and no acceptable heuristics for solving it are found. Since the development of heuristics is a rather complicated task, we decided to examine the problem by elementary methods in order to refine (and, possibly, simplify) the original problem statement. The results of our studies are discussed in this paper. Estimates of efficiency of parallel loop execution as functions of the number of nodes of homogeneous and heterogeneous parallel computational systems are obtained. These estimates show that the use of heterogeneous parallel systems reduces the efficiency even in the case when their communication subsystems are scaleable (see the definition in Section 4). The use of local networks (heterogeneous parallel computational systems with nonscaleable communication subsystems) for parallel computations with heavy data exchange is not advantageous and is possible only for a small number of nodes (about five). An algorithm of optimal distribution of data between the nodes of a homogeneous or heterogeneous computational system is suggested. Results of numerical experiments substantiate the conclusions obtained.