An overview of data warehousing and OLAP technology
ACM SIGMOD Record
STREAM: the stanford stream data manager (demonstration description)
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
TelegraphCQ: continuous dataflow processing
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Dynamic Load Distribution in the Borealis Stream Processor
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Intrusion Detection based on Clustering a Data Stream
SERA '05 Proceedings of the Third ACIS Int'l Conference on Software Engineering Research, Management and Applications
Interpreting the data: Parallel analysis with Sawzall
Scientific Programming - Dynamic Grids and Worldwide Computing
Map-reduce-merge: simplified relational data processing on large clusters
Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
SPC: a distributed, scalable platform for data mining
Proceedings of the 4th international workshop on Data mining standards, services and platforms
Fault-tolerance in the borealis distributed stream processing system
ACM Transactions on Database Systems (TODS)
Pig latin: a not-so-foreign language for data processing
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
SPADE: the system s declarative stream processing engine
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Implementing Diverse Messaging Models with Self-Managing Properties using IFLOW
ICAC '06 Proceedings of the 2006 IEEE International Conference on Autonomic Computing
C-MR: continuously executing MapReduce workflows on multi-core processors
Proceedings of third international workshop on MapReduce and its Applications Date
SAMOA: a platform for mining big data streams
Proceedings of the 22nd international conference on World Wide Web companion
Distributed data management using MapReduce
ACM Computing Surveys (CSUR)
The family of mapreduce and large-scale data processing systems
ACM Computing Surveys (CSUR)
CRUCIBLE: towards unified secure on- and off-line analytics at scale
DISCS-2013 Proceedings of the 2013 International Workshop on Data-Intensive Scalable Computing Systems
Hi-index | 0.00 |
MapReduce and stream processing are two emerging, but different, paradigms for analyzing, processing and making sense of large volumes of modern day data. While MapReduce offers the capability to analyze several terabytes of stored data, stream processing solutions offer the ability to process, possibly, a few million updates every second. However, there is an increasing number of data processing applications which need a solution that effectively and efficiently combines the benefits of MapReduce and stream processing to address their data processing needs. For example, in the automated stock trading domain, applications usually require periodic analysis of large amounts of stored data to generate a model using MapReduce, which is then used to process a stream of incident updates using a stream processing system. This paper presents Deduce, which extends IBM's System S stream processing middleware with support for MapReduce by providing (1) language and runtime support for easily specifying and embedding MapReduce jobs as elements of a larger data-flow, (2) capability to describe reusable modules that can be used as map and reduce tasks, and (3) configuration parameters that can be tweaked to control and manage the usage of shared resources by the MapReduce and stream processing components. We describe the motivation for Deduce and the design and implementation of the MapReduce extensions for System S, and then present experimental results.