True elasticity in multi-tenant data-intensive compute clusters

  • Authors:
  • Ganesh Ananthanarayanan;Christopher Douglas;Raghu Ramakrishnan;Sriram Rao;Ion Stoica

  • Affiliations:
  • University of California, Berkeley;Microsoft Corp.;Microsoft Corp.;Microsoft Corp.;University of California, Berkeley

  • Venue:
  • Proceedings of the Third ACM Symposium on Cloud Computing
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data-intensive computing (DISC) frameworks scale by partitioning a job across a set of fault-tolerant tasks, then diffusing those tasks across large clusters. Multi-tenanted clusters must accommodate service-level objectives (SLO) in their resource model, often expressed as a maximum latency for allocating the desired set of resources to every job. When jobs are partitioned into tasks statically, a cluster cannot meet its SLOs while maintaining both high utilization and efficiency. Ideally, we want to give resources to jobs when they are free but would expect to reclaim them instantaneously when new jobs arrive, without losing work. DISC frameworks do not support such elasticity because interrupting running tasks incurs high overheads. Amoeba enables lightweight elasticity in DISC frameworks by identifying points at which running tasks of over-provisioned jobs can be safely exited, committing their outputs, and spawning new tasks for the remaining work. Effectively, tasks of DISC jobs are now sized dynamically in response to global resource scarcity or abundance. Simulation and deployment of our prototype shows that Amoeba speeds up jobs by 32% without compromising utilization or efficiency.