DEDUCE: at the intersection of MapReduce and stream processing

Authors:
Vibhore Kumar;Henrique Andrade;Buğra Gedik;Kun-Lung Wu
Affiliations:
IBM Thomas J. Watson Research Center, Hawthorne, NY;IBM Thomas J. Watson Research Center, Hawthorne, NY;IBM Thomas J. Watson Research Center, Hawthorne, NY;IBM Thomas J. Watson Research Center, Hawthorne, NY
Venue:
Proceedings of the 13th International Conference on Extending Database Technology
Year:
2010

Citing 14
Cited 5

An overview of data warehousing and OLAP technology

ACM SIGMOD Record
STREAM: the stanford stream data manager (demonstration description)

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
TelegraphCQ: continuous dataflow processing

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Dynamic Load Distribution in the Borealis Stream Processor

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Intrusion Detection based on Clustering a Data Stream

SERA '05 Proceedings of the Third ACIS Int'l Conference on Software Engineering Research, Management and Applications
Interpreting the data: Parallel analysis with Sawzall

Scientific Programming - Dynamic Grids and Worldwide Computing
Map-reduce-merge: simplified relational data processing on large clusters

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
SPC: a distributed, scalable platform for data mining

Proceedings of the 4th international workshop on Data mining standards, services and platforms
Fault-tolerance in the borealis distributed stream processing system

ACM Transactions on Database Systems (TODS)
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SPADE: the system s declarative stream processing engine

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Implementing Diverse Messaging Models with Self-Managing Properties using IFLOW

ICAC '06 Proceedings of the 2006 IEEE International Conference on Autonomic Computing

C-MR: continuously executing MapReduce workflows on multi-core processors

Proceedings of third international workshop on MapReduce and its Applications Date
SAMOA: a platform for mining big data streams

Proceedings of the 22nd international conference on World Wide Web companion
Distributed data management using MapReduce

ACM Computing Surveys (CSUR)
The family of mapreduce and large-scale data processing systems

ACM Computing Surveys (CSUR)
CRUCIBLE: towards unified secure on- and off-line analytics at scale

DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

MapReduce and stream processing are two emerging, but different, paradigms for analyzing, processing and making sense of large volumes of modern day data. While MapReduce offers the capability to analyze several terabytes of stored data, stream processing solutions offer the ability to process, possibly, a few million updates every second. However, there is an increasing number of data processing applications which need a solution that effectively and efficiently combines the benefits of MapReduce and stream processing to address their data processing needs. For example, in the automated stock trading domain, applications usually require periodic analysis of large amounts of stored data to generate a model using MapReduce, which is then used to process a stream of incident updates using a stream processing system. This paper presents Deduce, which extends IBM's System S stream processing middleware with support for MapReduce by providing (1) language and runtime support for easily specifying and embedding MapReduce jobs as elements of a larger data-flow, (2) capability to describe reusable modules that can be used as map and reduce tasks, and (3) configuration parameters that can be tweaked to control and manage the usage of shared resources by the MapReduce and stream processing components. We describe the motivation for Deduce and the design and implementation of the MapReduce extensions for System S, and then present experimental results.