Incremental recomputations in MapReduce

Authors:
Thomas Jörg;Roya Parvizi;Hu Yong;Stefan Dessloch
Affiliations:
University of Kaiserslautern, Kaiserslautern, Germany;University of Kaiserslautern, Kaiserslautern, Germany;University of Kaiserslautern, Kaiserslautern, Germany;University of Kaiserslautern, Kaiserslautern, Germany
Venue:
Proceedings of the third international workshop on Cloud data management
Year:
2011

Citing 11
Cited 1

Incremental maintenance of views with duplicates

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Maintenance of data cubes and summary tables in a warehouse

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Data Integration using Self-Maintainable Views

EDBT '96 Proceedings of the 5th International Conference on Extending Database Technology: Advances in Database Technology
Performance Issues in Incremental Warehouse Maintenance

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Bigtable: a distributed storage system for structured data

OSDI '06 Proceedings of the 7th symposium on Operating systems design and implementation
Stateful bulk processing for incremental analytics

Proceedings of the 1st ACM symposium on Cloud computing
DryadInc: reusing work in large-scale computations

HotCloud'09 Proceedings of the 2009 conference on Hot topics in cloud computing
Large-scale incremental processing using distributed transactions and notifications

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Distributed cube materialization on holistic measures

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering

i2MapReduce: incremental iterative MapReduce

Proceedings of the 2nd International Workshop on Cloud Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper explores the application of view maintenance techniques in a MapReduce environment. Abstractly, a MapReduce program can be seen as a view definition and the computed result as a materialized view. As yet, MapReduce programs need to be re-executed to obtain up-to-date results after base data has changed, i.e. the view is recomputed from scratch. We present a case study based on typical MapReduce programs mentioned in Google's original MapReduce paper. By adapting view maintenance techniques, we were able to recompute results in an incremental fashion considerably more efficiently. Based on the case study, we develop a general solution for the incremental maintenance of the class of MapReduce programs that compute self-maintainable aggregates.