Large-scale incremental data processing with change propagation

Authors:
Pramod Bhatotia;Alexander Wieder;İstemi Ekin Akkuş;Rodrigo Rodrigues;Umut A. Acar
Affiliations:
Max Planck Institute for Software Systems;Max Planck Institute for Software Systems;Max Planck Institute for Software Systems;Max Planck Institute for Software Systems;Max Planck Institute for Software Systems
Venue:
HotCloud'11 Proceedings of the 3rd USENIX conference on Hot topics in cloud computing
Year:
2011

Citing 11
Cited 3

A categorized bibliography on incremental computation

POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Deriving Production Rules for Incremental View Maintenance

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
CEAL: a C-based language for self-adjusting computation

Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Traceable data types for self-adjusting computation

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Stateful bulk processing for incremental analytics

Proceedings of the 1st ACM symposium on Cloud computing
DryadInc: reusing work in large-scale computations

HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
HaLoop: efficient iterative data processing on large clusters

Proceedings of the VLDB Endowment
Nectar: automatic management of data and computation in datacenters

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Large-scale incremental processing using distributed transactions and notifications

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Nova: continuous Pig/Hadoop workflows

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

Incoop: MapReduce for incremental computations

Proceedings of the 2nd ACM Symposium on Cloud Computing
Two for the price of one: a model for parallel and incremental computation

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Data-Intensive Cloud Computing: Requirements, Expectations, Challenges, and Solutions

Journal of Grid Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Incremental processing of large-scale data is an increasingly important problem, given that many processing jobs run repeatedly with similar inputs, and that the de facto standard programmingmodel (MapReduce) was not designed to efficiently process small updates. As a result, new systems specifically targeting this problem (e.g., Google Percolator, or Yahoo! CBP) have been proposed. Unfortunately, these approaches require the adoption of a new programming model, breaking compatibility with existing programs, and increasing the burden on the programmer, who now is required to devise an incremental update mechanism. We claim that automatic incremental processing of large-scale data is possible by leveraging previous results from the algorithms and programming languages communities. As an example, we describe how Map Reduce can be improved to efficiently handle small input changes by automatically incrementalizing existing MapReduce computations, without breaking backward compatibility or demanding programmers to adopt a new programming approach.