C-MR: continuously executing MapReduce workflows on multi-core processors

Authors:
Nathan Backman;Karthik Pattabiraman;Rodrigo Fonseca;Ugur Cetintemel
Affiliations:
Brown University, Providence, RI, USA;Brown University, Providence, RI, USA;Brown University, Providence, RI, USA;Brown University, Providence, RI, USA
Venue:
Proceedings of third international workshop on MapReduce and its Applications Date
Year:
2012

Citing 7
Cited 1

Exploiting Punctuation Semantics in Continuous Data Streams

IEEE Transactions on Knowledge and Data Engineering
No pane, no gain: efficient evaluation of sliding-window aggregates over data streams

ACM SIGMOD Record
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
DEDUCE: at the intersection of MapReduce and stream processing

Proceedings of the 13th International Conference on Extending Database Technology
MapReduce online

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Phoenix++: modular MapReduce for shared-memory systems

Proceedings of the second international workshop on MapReduce and its applications
In-situ MapReduce for log processing

USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference

StreamHub: a massively parallel architecture for high-performance content-based publish/subscribe

Proceedings of the 7th ACM international conference on Distributed event-based systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The widespread appeal of MapReduce is due, in part, to its simple programming model. Programmers provide only application logic while the MapReduce framework handles the logistics of data distribution and parallel task management. We present the Continuous-MapReduce (C-MR) framework which implements a modified MapReduce processing model to continuously execute workflows of MapReduce jobs on unbounded data streams. In keeping with the philosophy of MapReduce, C-MR abstracts away the complexities of parallel stream processing and workflow scheduling while providing the simple and familiar MapReduce programming interface with the addition of stream window semantics. Modifying the MapReduce processing model allowed us to: (1) maintain correct stream order and execution semantics in the presence of parallel and asynchronous processing elements; (2) implement an operator scheduler framework to facilitate latency-oriented scheduling policies for executing complex workflows of MapReduce jobs; and (3) leverage much of the work that has gone into the last decade of stream processing research including: pipelined parallelism, incremental processing for both Map and Reduce operations, minimizing redundant computations, sharing of sub-queries, and adaptive query processing. C-MR was developed for use on a multiprocessor architecture, where we demonstrate its effectiveness at supporting high-performance stream processing even in the presence of load spikes and external workloads.