Exploiting Punctuation Semantics in Continuous Data Streams
IEEE Transactions on Knowledge and Data Engineering
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
DEDUCE: at the intersection of MapReduce and stream processing
Proceedings of the 13th International Conference on Extending Database Technology
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Phoenix++: modular MapReduce for shared-memory systems
Proceedings of the second international workshop on MapReduce and its applications
In-situ MapReduce for log processing
USENIXATC'11 Proceedings of the 2011 USENIX conference on USENIX annual technical conference
StreamHub: a massively parallel architecture for high-performance content-based publish/subscribe
Proceedings of the 7th ACM international conference on Distributed event-based systems
Hi-index | 0.00 |
The widespread appeal of MapReduce is due, in part, to its simple programming model. Programmers provide only application logic while the MapReduce framework handles the logistics of data distribution and parallel task management. We present the Continuous-MapReduce (C-MR) framework which implements a modified MapReduce processing model to continuously execute workflows of MapReduce jobs on unbounded data streams. In keeping with the philosophy of MapReduce, C-MR abstracts away the complexities of parallel stream processing and workflow scheduling while providing the simple and familiar MapReduce programming interface with the addition of stream window semantics. Modifying the MapReduce processing model allowed us to: (1) maintain correct stream order and execution semantics in the presence of parallel and asynchronous processing elements; (2) implement an operator scheduler framework to facilitate latency-oriented scheduling policies for executing complex workflows of MapReduce jobs; and (3) leverage much of the work that has gone into the last decade of stream processing research including: pipelined parallelism, incremental processing for both Map and Reduce operations, minimizing redundant computations, sharing of sub-queries, and adaptive query processing. C-MR was developed for use on a multiprocessor architecture, where we demonstrate its effectiveness at supporting high-performance stream processing even in the presence of load spikes and external workloads.