Distributed dynamic load balancing for pipelined computations on heterogeneous systems

Authors:
Ioannis Riakiotakis;Florina M. Ciorba;Theodore Andronikos;George Papakonstantinou
Affiliations:
Computing Systems Laboratory, Department of Electrical & Computer Engineering, National Technical University of Athens, Greece;Center for Information Services and High Performance Computing, Technische Universität, Dresden, Germany;Department of Informatics, Ionian University, Corfu, Greece;Computing Systems Laboratory, Department of Electrical & Computer Engineering, National Technical University of Athens, Greece
Venue:
Parallel Computing
Year:
2011

Citing 19
Cited 0

Allocating Independent Subtasks on Parallel Processors

IEEE Transactions on Software Engineering
Adaptive load sharing in homogeneous distributed systems

IEEE Transactions on Software Engineering
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers

IEEE Transactions on Computers
The Influence of Different Workload Descriptions on a Heuristic Load Balancing Scheme

IEEE Transactions on Software Engineering
Performance prediction of distributed load balancing on multicomputer systems

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Factoring: a method for scheduling parallel loops

Communications of the ACM
Load-sharing in heterogeneous systems via weighted factoring

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Load Sharing with Consideration of Future Task Arrivals in Heterogeneous Distributed Real-Time Systems

IEEE Transactions on Computers
Trapezoid Self-Scheduling: A Practical Scheduling Scheme for Parallel Compilers

IEEE Transactions on Parallel and Distributed Systems
A taxonomy of scheduling in general-purpose distributed computing systems

IEEE Transactions on Software Engineering
Feedback Guided Dynamic Loop Scheduling: Algorithms and Experiments

Euro-Par '98 Proceedings of the 4th International Euro-Par Conference on Parallel Processing
On the Scalability of Dynamic Scheduling Scientific Applications with Adaptive Weighted Factoring

Cluster Computing
A Class of Loop Self-Scheduling for Heterogeneous Clusters

CLUSTER '01 Proceedings of the 3rd IEEE International Conference on Cluster Computing
Scalable Loop Self-Scheduling Schemes for Heterogeneous Clusters

CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers (2nd Edition)

Parallel Programming: Techniques and Applications Using Networked Workstations and Parallel Computers (2nd Edition)
Enhancing self-scheduling algorithms via synchronization and weighting

Journal of Parallel and Distributed Computing
Patterns for parallel programming

Patterns for parallel programming
Optimal synchronization frequency for dynamic pipelined computations on heterogeneous systems

CLUSTER '07 Proceedings of the 2007 IEEE International Conference on Cluster Computing
Dynamic multi phase scheduling for heterogeneous cluste

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the most significant causes for performance degradation of scientific and engineering applications on high performance computing systems is the uneven distribution of the computational work to the resources of the system. This effect, which is known as load imbalance, is even more noticeable in the case of irregular applications and heterogeneous distributed systems. This motivated the parallel and distributed computing research community to focus on methods that provide good load balancing for scientific and engineering applications running on (heterogeneous) distributed systems. Efficient load balancing and scheduling methods are employed for scientific applications from various fields, such as mechanics, materials, physics, chemistry, biology, applied mathematics, etc. Such applications typically employ a large number of computational methods in order to simulate complex phenomena, on very large scales of time and magnitude. These simulations consist of routines that perform repetitive computations (in the form of DO/FOR loops) over very large data sets, which, if not properly implemented and executed, may suffer from poor performance. The number of repetitive computations in the simulation codes is not always constant. Moreover, the computational nature of these simulations may be in fact irregular, leading to the case when one computation takes (unpredictably) more time than others. For successful and timely results, large scale simulations require the use of large scale computing systems, which often are widely distributed and highly heterogeneous. Moreover, large scale computing systems are usually shared among multiple users, which causes the quality and quantity of the available resources to be highly unpredictable. There are numerous load balancing methods in the literature for different parallel architectures. The most recent of these methods typically follow the master-worker paradigm, where a single coordinator (master) is responsible for making all the scheduling decisions based on information provided by the workers. Depending on the application requirements, the scheduling policy and the computational environment, the benefits of this paradigm may be limited as follows: (1) its efficiency may not scale as the number of processors increases, and (2) it is quite probable that the scheduling decisions are made based on outdated information, especially on systems where the workload changes rapidly. In an effort to address these limitations, we propose a distributed (master-less) load balancing scheme, in which the scheduling decisions are made by the workers in a distributed fashion. We implemented this method along with other two master-worker schemes (a previously existing one and a recently modified one) for three different scientific computational kernels. In order to validate the usefulness and efficiency of the proposed scheme, we conducted a series of comparative performance tests with the two master-worker schemes for each computational kernel. The target system is an SMP cluster, on which we simulated three different patterns of system load fluctuation. The experiments strongly support the belief that the distributed approach offers greater performance and better scalability on such systems, showing an overall improvement ranging from 13% to 24% over the master-worker approaches.