Continuous mapreduce for In-DB stream analytics

Authors:
Qiming Chen;Meichun Hsu
Affiliations:
HP Labs, Palo Alto, California and Hewlett Packard Co.;HP Labs, Palo Alto, California and Hewlett Packard Co.
Venue:
OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems
Year:
2010

Citing 9
Cited 3

The CQL continuous query language: semantic foundations and query execution

The VLDB Journal — The International Journal on Very Large Data Bases
Experiences with MapReduce, an abstraction for large-scale computation

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Map-reduce-merge: simplified relational data processing on large clusters

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
SPADE: the system s declarative stream processing engine

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Clustera: an integrated computation and data management system

Proceedings of the VLDB Endowment
Data-Continuous SQL Process Model

OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part I on On the Move to Meaningful Internet Systems:
Exploiting the power of relational databases for efficient stream processing

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Extend UDF Technology for Integrated Analytics

DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Efficiently support MapReduce-like computation models inside parallel DBMS

IDEAS '09 Proceedings of the 2009 International Database Engineering & Applications Symposium

Experience in Continuous analytics as a Service (CaaaS)

Proceedings of the 14th International Conference on Extending Database Technology
Query engine grid for executing SQL streaming process

Globe'11 Proceedings of the 4th international conference on Data management in grid and peer-to-peer systems
SQL streaming process in query engine net

OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Scaling-out data-intensive analytics is generally made by means of parallel computation for gaining CPU bandwidth, and incremental computation for balancing workload. Combining these two mechanisms is the key to support large scale stream analytics. Map-Reduce (M-R) is a programming model for supporting parallel computation over vast amounts of data on large clusters of commodity machines. Through a simple interface with two functions, map and reduce, this model facilitates parallel implementation of data intensive applications. In-DB M-R allows these functions to be embedded within standard queries to exploit the SQL expressive power, and allows them to be executed by the query engine with fast data access and reduced data move. However, when the data form infinite streams, the semantics and scale-out capability of M-R are challenged. To solve this problem, we propose to integrate M-R with the continuous query model characterized by Cut-Rewind (C-R), i.e. cut a query execution based on some granule of the stream data and then rewind the state of the query without shutting it down, for processing the next chunk of stream data. This approach allows an M-R query with full SQL expressive power to be applied to dynamic stream data chunk by chunk for continuous, window-based stream analytics. Our experience shows that integrating M-R and C-R can provide a powerful combination for parallelized and granulized stream processing. This combination enables us to scale-out stream analytics "horizontally" based on the MR model, and "vertically" based on the C-R model. The proposed approach has been prototyped on a commercial and proprietary parallel database engine. Our preliminary experiments reveal the merit of using query engine for near-real-time parallel and incremental stream analytics.