Evaluating Transport Protocols for Real-Time Event Stream Processing Middleware and Applications
OTM '09 Proceedings of the Confederated International Conferences, CoopIS, DOA, IS, and ODBASE 2009 on On the Move to Meaningful Internet Systems: Part I
Ricochet: lateral error correction for time-critical multicast
NSDI'07 Proceedings of the 4th USENIX conference on Networked systems design & implementation
Adapting distributed real-time and embedded pub/sub middleware for cloud computing environments
Proceedings of the ACM/IFIP/USENIX 11th International Conference on Middleware
Timely Autonomic Adaptation of Publish/Subscribe Middleware in Dynamic Environments
International Journal of Adaptive, Resilient and Autonomic Systems
Hi-index | 0.00 |
Datacenters are complex environments consisting of thousands of failure-prone commodity components connected by fast, high-capacity interconnects. The software running on such datacenters typically uses multicast communication patterns involving multiple senders. We examine the problem of time-critical multicast in such settings, and propose Slingshot, a protocol that uses receiver-based FEC to recover lost packets quickly. Slingshot offers probabilistic guarantees on timeliness by having receivers exchange FEC packets in an initial phase, and optional complete reliability on packets not recovered in this first phase. We evaluate an implementation of Slingshot against SRM, a well-known multicast protocol, and show that it achieves two orders of magnitude faster recovery in datacenter settings.