Slingshot: Time-CriticalMulticast for Clustered Applications

  • Authors:
  • Mahesh Balakrishnan;Stefan Pleisch;Ken Birman

  • Affiliations:
  • Department of Computer Science Cornell University, Ithaca, NY;Department of Computer Science Cornell University, Ithaca, NY;Department of Computer Science Cornell University, Ithaca, NY

  • Venue:
  • NCA '05 Proceedings of the Fourth IEEE International Symposium on Network Computing and Applications
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Datacenters are complex environments consisting of thousands of failure-prone commodity components connected by fast, high-capacity interconnects. The software running on such datacenters typically uses multicast communication patterns involving multiple senders. We examine the problem of time-critical multicast in such settings, and propose Slingshot, a protocol that uses receiver-based FEC to recover lost packets quickly. Slingshot offers probabilistic guarantees on timeliness by having receivers exchange FEC packets in an initial phase, and optional complete reliability on packets not recovered in this first phase. We evaluate an implementation of Slingshot against SRM, a well-known multicast protocol, and show that it achieves two orders of magnitude faster recovery in datacenter settings.