Sparrow: distributed, low latency scheduling

  • Authors:
  • Kay Ousterhout;Patrick Wendell;Matei Zaharia;Ion Stoica

  • Affiliations:
  • University of California, Berkeley;University of California, Berkeley;University of California, Berkeley;University of California, Berkeley

  • Venue:
  • Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Large-scale data analytics frameworks are shifting towards shorter task durations and larger degrees of parallelism to provide low latency. Scheduling highly parallel jobs that complete in hundreds of milliseconds poses a major challenge for task schedulers, which will need to schedule millions of tasks per second on appropriate machines while offering millisecond-level latency and high availability. We demonstrate that a decentralized, randomized sampling approach provides near-optimal performance while avoiding the throughput and availability limitations of a centralized design. We implement and deploy our scheduler, Sparrow, on a 110-machine cluster and demonstrate that Sparrow performs within 12% of an ideal scheduler.