Task Allocation for Maximizing Reliability of Distributed Computer Systems
IEEE Transactions on Computers
Parallel Computing in Networks of Workstations with Paralex
IEEE Transactions on Parallel and Distributed Systems
Process Allocation for Load Distribution in Fault-Tolerant Multicomputers
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Optimization of distributed, object-oriented systems
OOPSLA '00 Addendum to the 2000 proceedings of the conference on Object-oriented programming, systems, languages, and applications (Addendum)
Optimizing systems by work schedules: (a stochastic approach)
WOSP '02 Proceedings of the 3rd international workshop on Software and performance
Journal of Parallel and Distributed Computing
Satisfaction-based query replication
Distributed and Parallel Databases
Hi-index | 14.98 |
In this paper, we consider a load-balancing process allocation method for fault-tolerant multicomputer systems that balances the load before as well as after faults start to degrade the performance of the system. In order to be able to tolerate a single fault, each process (primary process) is duplicated (i.e., has a backup process). The backup process executes on a different processor from the primary, checkpointing the primary process and recovering the process if the primary process fails. In this paper, we formalize the problem of load-balancing process allocation and propose a new process allocation method and analyze the performance of the proposed method. Simulations are used to compare the proposed method with a process allocation method that does not take into account the different load characteristics of the primary and backup processes. While both methods perform well before the occurrence of a fault, only the proposed method maintains a balanced load after the occurrence of such a fault.