Nectar: automatic management of data and computation in datacenters

  • Authors:
  • Pradeep Kumar Gunda;Lenin Ravindranath;Chandramohan A. Thekkath;Yuan Yu;Li Zhuang

  • Affiliations:
  • Microsoft Research Silicon Valley;Microsoft Research Silicon Valley and Massachusetts Institute of Technology;Microsoft Research Silicon Valley;Microsoft Research Silicon Valley;Microsoft Research Silicon Valley

  • Venue:
  • OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Managing data and computation is at the heart of datacenter computing. Manual management of data can lead to data loss, wasteful consumption of storage, and laborious bookkeeping. Lack of proper management of computation can result in lost opportunities to share common computations across multiple jobs or to compute results incrementally. Nectar is a system designed to address the aforementioned problems. It automates and unifies the management of data and computation within a datacenter. In Nectar, data and computation are treated interchangeably by associating data with its computation. Derived datasets, which are the results of computations, are uniquely identified by the programs that produce them, and together with their programs, are automatically managed by a datacenter wide caching service. Any derived dataset can be transparently regenerated by reexecuting its program, and any computation can be transparently avoided by using previously cached results. This enables us to greatly improve datacenter management and resource utilization: obsolete or infrequently used derived datasets are automatically garbage collected, and shared common computations are computed only once and reused by others. This paper describes the design and implementation of Nectar, and reports on our evaluation of the system using analytic studies of logs from several production clusters and an actual deployment on a 240-node cluster.