Disco: a computing platform for large-scale data analytics

  • Authors:
  • Prashanth Mundkur;Ville Tuulos;Jared Flatow

  • Affiliations:
  • Nokia Research Center, Palo Alto, CA, USA;Nokia Research Center, Palo Alto, CA, USA;Nokia Research Center, Palo Alto, CA, USA

  • Venue:
  • Proceedings of the 10th ACM SIGPLAN workshop on Erlang
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe the design and implementation of Disco, a distributed computing platform for MapReduce style computations on large-scale data. Disco is designed for operation in clusters of commodity server machines, and provides both a fault-tolerant scheduling and execution layer as well as a distributed and replicated storage layer. Disco is implemented in Erlang and Python; Erlang is used for the implementation of the core aspects of cluster monitoring, job management, task scheduling and distributed filesystem, while Python is used to implement the standard Disco library. Disco has been used in production for several years at Nokia, to analyze tens of terabytes of data daily on a cluster of over 100 nodes. With a small but very functional codebase, it provides a free, proven, and effective component of a full-fledged data analytics stack.