Introduction to Parallel & Vector Solution of Linear Systems
Introduction to Parallel & Vector Solution of Linear Systems
Corpus-based static branch prediction
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
IEEE Transactions on Parallel and Distributed Systems
A template for non-uniform parallel loops based on dynamic scheduling and prefetching techniques
ICS '96 Proceedings of the 10th international conference on Supercomputing
A taxonomy of scheduling in general-purpose distributed computing systems
IEEE Transactions on Software Engineering
On the Stability of a Distributed Dynamic Load Balancing Algorithm
ICPADS '98 Proceedings of the 1998 International Conference on Parallel and Distributed Systems
Supporting schedules of resource co-allocation for distributed computing in scalable systems
Programming and Computing Software
Hi-index | 0.01 |
The problem of load balancing when executing parallel programs on computational systems with distributed memory is currently of great interest. The most general statement of this problem is that for one parallel loop: execution of a heterogeneous loop on a heterogeneous computational system. When stated in this way, the problem is NP-complete even in the case of two nodes, and no acceptable heuristics for solving it are found. Since the development of heuristics is a rather complicated task, we decided to examine the problem by elementary methods in order to refine (and, possibly, simplify) the original problem statement. The results of our studies are discussed in this paper. Estimates of efficiency of parallel loop execution as functions of the number of nodes of homogeneous and heterogeneous parallel computational systems are obtained. These estimates show that the use of heterogeneous parallel systems reduces the efficiency even in the case when their communication subsystems are scaleable (see the definition in Section 4). The use of local networks (heterogeneous parallel computational systems with nonscaleable communication subsystems) for parallel computations with heavy data exchange is not advantageous and is possible only for a small number of nodes (about five). An algorithm of optimal distribution of data between the nodes of a homogeneous or heterogeneous computational system is suggested. Results of numerical experiments substantiate the conclusions obtained.