Amortized efficiency of list update and paging rules
Communications of the ACM
Optimal time randomized consensus—making resilient algorithms fast in practice
SODA '91 Proceedings of the second annual ACM-SIAM symposium on Discrete algorithms
Performing work efficiently in the presence of faults
PODC '92 Proceedings of the eleventh annual ACM symposium on Principles of distributed computing
Time-optimal message-efficient work performance in the presence of faults
PODC '94 Proceedings of the thirteenth annual ACM symposium on Principles of distributed computing
Totem: a fault-tolerant multicast group communication system
Communications of the ACM
The Transis approach to high availability cluster communication
Communications of the ACM
Horus: a flexible group communication system
Communications of the ACM
ACM SIGOPS Operating Systems Review
Dynamic voting for consistent primary components
PODC '97 Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing
A dynamic view-oriented group communication service
PODC '98 Proceedings of the seventeenth annual ACM symposium on Principles of distributed computing
A review of experiences with reliable multicast
Software—Practice & Experience
Specifying and using a partitionable group communication service
ACM Transactions on Computer Systems (TOCS)
Optimal scheduling for disconnected cooperation
Proceedings of the twentieth annual ACM symposium on Principles of distributed computing
Fault-Tolerant Parallel Computation
Fault-Tolerant Parallel Computation
Reliable Distributed Computing with the ISIS Toolkit
Reliable Distributed Computing with the ISIS Toolkit
Multicast Group Communication as a Base for a Load-Balancing Replicated Data Service
DISC '98 Proceedings of the 12th International Symposium on Distributed Computing
The Bancomat Problem: An Example of Resource Allocation in a Partitionable Asynchronous System
DISC '98 Proceedings of the 12th International Symposium on Distributed Computing
Distributed Cooperation During the Absence of Communication
DISC '00 Proceedings of the 14th International Conference on Distributed Computing
The Complexity of Synchronous Iterative Do-All with Crashes
DISC '01 Proceedings of the 15th International Conference on Distributed Computing
Resolving message complexity of Byzantine Agreement and beyond
FOCS '95 Proceedings of the 36th Annual Symposium on Foundations of Computer Science
System Support for Partition-Aware Network Applications
ICDCS '98 Proceedings of the The 18th International Conference on Distributed Computing Systems
Newtop: a fault-tolerant group communication protocol
ICDCS '95 Proceedings of the 15th International Conference on Distributed Computing Systems
Local Scheduling for Distributed Cooperation
NCA '01 Proceedings of the IEEE International Symposium on Network Computing and Applications (NCA'01)
The ensemble system
Cooperative computing with fragmentable and mergeable groups
Journal of Discrete Algorithms
Distributed scheduling for disconnected cooperation
Distributed scheduling for disconnected cooperation
Robust distributed cooperation in the presence of quantified adversity
Robust distributed cooperation in the presence of quantified adversity
Performing tasks on synchronous restartable message-passing processors
Distributed Computing
Work-Competitive Scheduling for Cooperative Computing with Dynamic Groups
SIAM Journal on Computing
Emulating shared-memory Do-All algorithms in asynchronous message-passing systems
Journal of Parallel and Distributed Computing
Hi-index | 5.23 |
This work considers the problem of efficiently performing a set of tasks using a network of processors in the setting where the network is subject to dynamic reconfigurations, including partitions and merges. A key challenge for this setting is the implementation of dynamic load balancing that reduces the number of tasks that are performed redundantly because of the reconfigurations. We explore new approaches for load balancing in dynamic networks that can be employed by applications using a group communication service (GCS). The GCS that we consider include a membership service (establishing new groups to reflect dynamic changes) but does not include maintenance of a primary component. For the n-processor, n-task load balancing problem defined in this work, the following specific results are obtained. For the case of fully dynamic changes including fragmentation and merges we show that the termination time of any on-line task assignment algorithm is greater than the termination time of an off-line task assignment algorithm by a factor greater than n/12. We present a load balancing algorithm that guarantees completion of all tasks in all fragments caused by partitions with work O(n + f ċ n) in the presence of f fragmentation failures. We develop an effective scheduling strategy for minimizing the task execution redundancy and we prove that our strategy provides each of the n processors with a schedule of Θ(n1/3) tasks such that at most one task is performed redundantly by any two processors.