The consensus problem in fault-tolerant computing
ACM Computing Surveys (CSUR)
Detecting termination by weight-throwing in a faulty distributed system
Journal of Parallel and Distributed Computing
EW 7 Proceedings of the 7th workshop on ACM SIGOPS European workshop: Systems support for worldwide applications
Scalable Parallel Computing: Technology,Architecture,Programming
Scalable Parallel Computing: Technology,Architecture,Programming
Design and Implementation of Heartbeat in Multi-Machine Environment
AINA '03 Proceedings of the 17th International Conference on Advanced Information Networking and Applications
Recoverable mobile environment: design and trade-off analysis
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
Accelerated Heartbeat Protocols
ICDCS '98 Proceedings of the The 18th International Conference on Distributed Computing Systems
Linux-HA heartbeat system design
ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Hi-index | 0.00 |
Distributed managed clusters have appeared in recent years, and computing intensive scientific problems request large-scale clusters. However, many of the traditional heartbeat mechanisms do not fit large-scale distributed managed clusters. In this paper, we propose a switch-based heartbeat mechanism named heartbeat ring, which adapts to large-scale distributed managed clusters. Heartbeat ring mechanism has the prominent advantages in simplicity, scalability and adaptability, and so on. Finally, based on a prototype implemented on Linux platform, experiment evaluation is presented.