Chord: A scalable peer-to-peer lookup service for internet applications
Proceedings of the 2001 conference on Applications, technologies, architectures, and protocols for computer communications
MRNet: A Software-Based Multicast/Reduction Network for Scalable Tools
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Correctness of a gossip based membership protocol
Proceedings of the twenty-fourth annual ACM symposium on Principles of distributed computing
Scalable, fault tolerant membership for MPI tasks on HPC systems
Proceedings of the 20th annual international conference on Supercomputing
SpiderCast: a scalable interest-aware overlay for topic-based pub/sub communication
Proceedings of the 2007 inaugural international conference on Distributed event-based systems
Efficient reconciliation and flow control for anti-entropy protocols
LADIS '08 Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Hi-index | 0.00 |
High performance computing systems display increasing complexity and component counts. This trend exposes weaknesses in the underlying clustering infrastructure needed for continuous availability, maximizing utilization, and efficient administration of such systems. To mitigate the problem, we present a highly scalable clustering infrastructure, based on peer-to-peer technologies, for supporting resiliency-aware applications as well as efficient monitoring and load balancing. Supported services include Membership, Publish-subscribe messaging, Convergecast, Attribute replication and a DHT. We present a preliminary evaluation taken from an IBM BlueGene/P, demonstrating scalability up to ~ 256K nodes.