Boom analytics: exploring data-centric, declarative programming for the cloud

  • Authors:
  • Peter Alvaro;Tyson Condie;Neil Conway;Khaled Elmeleegy;Joseph M. Hellerstein;Russell Sears

  • Affiliations:
  • University of California, Berkeley, Berkeley, CA, USA;University of California, Berkeley, Berkeley, CA, USA;University of California, Berkeley, Berkeley, CA, USA;Yahoo! Research, Sunnyvale, CA, USA;University of California, Berkeley, Berkeley, CA, USA;University of California, Berkeley, Berkeley, CA, USA

  • Venue:
  • Proceedings of the 5th European conference on Computer systems
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Building and debugging distributed software remains extremely difficult. We conjecture that by adopting a data-centric approach to system design and by employing declarative programming languages, a broad range of distributed software can be recast naturally in a data-parallel programming model. Our hope is that this model can significantly raise the level of abstraction for programmers, improving code simplicity, speed of development, ease of software evolution, and program correctness. This paper presents our experience with an initial large-scale experiment in this direction. First, we used the Overlog language to implement a "Big Data" analytics stack that is API-compatible with Hadoop and HDFS and provides comparable performance. Second, we extended the system with complex distributed features not yet available in Hadoop, including high availability, scalability, and unique monitoring and debugging facilities. We present both quantitative and anecdotal results from our experience, providing some concrete evidence that both data-centric design and declarative languages can substantially simplify distributed systems programming.