Streaming big data with self-adjusting computation

  • Authors:
  • Umut A. Acar;Yan Chen

  • Affiliations:
  • Carnegie Mellon University, Pittsburgh, PA, USA;Max Planck Institute for Software Systems, Kaiserslautern, Germany

  • Venue:
  • DDFP '13 Proceedings of the 2013 workshop on Data driven functional programming
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many big data computations involve processing data that changes incrementally or dynamically over time. Using existing techniques, such computations quickly become impractical. For example, computing the frequency of words in the first ten thousand paragraphs of a publicly available Wikipedia data set in a streaming fashion using MapReduce can take as much as a full day. In this paper, we propose an approach based on self-adjusting computation that can dramatically improve the efficiency of such computations. As an example, we can perform the aforementioned streaming computation in just a couple of minutes.