Dynamic load balancing with group communication

Authors:
Shlomi Dolev;Roberto Segala;Alexander Shvartsman
Affiliations:
Department of Computer Science, Ben-Gurion University, Beer-Sheva, Israel;Dipartamento di Informatica, Università di Verona, Verona, Italy;Department of Computer Science and Engineering, University of Connecticut, Storrs, CT and MIT Computer Science and Artificial Intelligence Laboratory, The Stata Center, Cambridge, MA
Venue:
Theoretical Computer Science
Year:
2006

Citing 29
Cited 1

Amortized efficiency of list update and paging rules

Communications of the ACM
Optimal time randomized consensus—making resilient algorithms fast in practice

SODA '91 Proceedings of the second annual ACM-SIAM symposium on Discrete algorithms
Performing work efficiently in the presence of faults

PODC '92 Proceedings of the eleventh annual ACM symposium on Principles of distributed computing
Time-optimal message-efficient work performance in the presence of faults

PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
Totem: a fault-tolerant multicast group communication system

Communications of the ACM
The Transis approach to high availability cluster communication

Communications of the ACM
Horus: a flexible group communication system

Communications of the ACM
Group membership and view synchrony in partitionable asynchronous distributed systems: specifications

ACM SIGOPS Operating Systems Review
Dynamic voting for consistent primary components

PODC '97 Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing
A dynamic view-oriented group communication service

PODC '98 Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing
A review of experiences with reliable multicast

Software—Practice & Experience
Specifying and using a partitionable group communication service

ACM Transactions on Computer Systems (TOCS)
Optimal scheduling for disconnected cooperation

Proceedings of the twentieth annual ACM symposium on Principles of distributed computing
Fault-Tolerant Parallel Computation

Fault-Tolerant Parallel Computation
Reliable Distributed Computing with the ISIS Toolkit

Reliable Distributed Computing with the ISIS Toolkit
Multicast Group Communication as a Base for a Load-Balancing Replicated Data Service

DISC '98 Proceedings of the 12th International Symposium on Distributed Computing
The Bancomat Problem: An Example of Resource Allocation in a Partitionable Asynchronous System

DISC '98 Proceedings of the 12th International Symposium on Distributed Computing
Distributed Cooperation During the Absence of Communication

DISC '00 Proceedings of the 14th International Conference on Distributed Computing
The Complexity of Synchronous Iterative Do-All with Crashes

DISC '01 Proceedings of the 15th International Conference on Distributed Computing
Resolving message complexity of Byzantine Agreement and beyond

FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
System Support for Partition-Aware Network Applications

ICDCS '98 Proceedings of the The 18th International Conference on Distributed Computing Systems
Newtop: a fault-tolerant group communication protocol

ICDCS '95 Proceedings of the 15th International Conference on Distributed Computing Systems
Local Scheduling for Distributed Cooperation

NCA '01 Proceedings of the IEEE International Symposium on Network Computing and Applications (NCA'01)
The ensemble system

The ensemble system
Cooperative computing with fragmentable and mergeable groups

Journal of Discrete Algorithms
Distributed scheduling for disconnected cooperation

Distributed scheduling for disconnected cooperation
Robust distributed cooperation in the presence of quantified adversity

Robust distributed cooperation in the presence of quantified adversity
Performing tasks on synchronous restartable message-passing processors

Distributed Computing
Work-Competitive Scheduling for Cooperative Computing with Dynamic Groups

SIAM Journal on Computing

Emulating shared-memory Do-All algorithms in asynchronous message-passing systems

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	5.23

Visualization

Abstract

This work considers the problem of efficiently performing a set of tasks using a network of processors in the setting where the network is subject to dynamic reconfigurations, including partitions and merges. A key challenge for this setting is the implementation of dynamic load balancing that reduces the number of tasks that are performed redundantly because of the reconfigurations. We explore new approaches for load balancing in dynamic networks that can be employed by applications using a group communication service (GCS). The GCS that we consider include a membership service (establishing new groups to reflect dynamic changes) but does not include maintenance of a primary component. For the n-processor, n-task load balancing problem defined in this work, the following specific results are obtained. For the case of fully dynamic changes including fragmentation and merges we show that the termination time of any on-line task assignment algorithm is greater than the termination time of an off-line task assignment algorithm by a factor greater than n/12. We present a load balancing algorithm that guarantees completion of all tasks in all fragments caused by partitions with work O(n + f ċ n) in the presence of f fragmentation failures. We develop an effective scheduling strategy for minimizing the task execution redundancy and we prove that our strategy provides each of the n processors with a schedule of Θ(n1/3) tasks such that at most one task is performed redundantly by any two processors.