Streaming big data with self-adjusting computation

Authors:
Umut A. Acar;Yan Chen
Affiliations:
Carnegie Mellon University, Pittsburgh, PA, USA;Max Planck Institute for Software Systems, Kaiserslautern, Germany
Venue:
DDFP '13 Proceedings of the 2013 workshop on Data driven functional programming
Year:
2013

Citing 19
Cited 0

A categorized bibliography on incremental computation

POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Self-adjusting computation

Self-adjusting computation
Adaptive functional programming

ACM Transactions on Programming Languages and Systems (TOPLAS)
A proposal for parallel self-adjusting computation

Proceedings of the 2007 workshop on Declarative aspects of multicore programming
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Imperative self-adjusting computation

Proceedings of the 35th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A cost semantics for self-adjusting computation

Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
CEAL: a C-based language for self-adjusting computation

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
An experimental analysis of self-adjusting computation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Stateful bulk processing for incremental analytics

Proceedings of the 1st ACM symposium on Cloud computing
Comet: batched stream processing for data intensive distributed computing

Proceedings of the 1st ACM symposium on Cloud computing
DryadInc: reusing work in large-scale computations

HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
Nectar: automatic management of data and computation in datacenters

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Large-scale incremental processing using distributed transactions and notifications

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Nova: continuous Pig/Hadoop workflows

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Implicit self-adjusting computation for purely functional programs

Proceedings of the 16th ACM SIGPLAN international conference on Functional programming
Incoop: MapReduce for incremental computations

Proceedings of the 2nd ACM Symposium on Cloud Computing
Two for the price of one: a model for parallel and incremental computation

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Type-directed automatic incrementalization

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many big data computations involve processing data that changes incrementally or dynamically over time. Using existing techniques, such computations quickly become impractical. For example, computing the frequency of words in the first ten thousand paragraphs of a publicly available Wikipedia data set in a streaming fashion using MapReduce can take as much as a full day. In this paper, we propose an approach based on self-adjusting computation that can dramatically improve the efficiency of such computations. As an example, we can perform the aforementioned streaming computation in just a couple of minutes.