Exploiting the power of relational databases for efficient stream processing

Authors:
Erietta Liarou;Romulo Goncalves;Stratos Idreos
Affiliations:
CWI Amsterdam, The Netherlands;CWI Amsterdam, The Netherlands;CWI Amsterdam, The Netherlands
Venue:
Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
Year:
2009

Citing 13
Cited 16

NiagaraCQ: a scalable continuous query system for Internet databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Petri Nets

ACM Computing Surveys (CSUR)
Continuously adaptive continuous queries over streams

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Continuous queries over data streams

ACM SIGMOD Record
Alert: An Architecture for Transforming a Passive DBMS into an Active DBMS

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Gigascope: a stream database for network applications

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Operator scheduling in data stream systems

The VLDB Journal — The International Journal on Very Large Data Bases
Retrospective on Aurora

The VLDB Journal — The International Journal on Very Large Data Bases
QPipe: a simultaneously pipelined relational query engine

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Customizable parallel execution of scientific stream queries

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Continuous query processing in data streams using duality of data and queries

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Design, implementation, and evaluation of the linear road bnchmark on the stream processing core

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Linear road: a stream data management benchmark

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Database architecture evolution: mammals flourished long before dinosaurs became extinct

Proceedings of the VLDB Endowment
Maintaining consistent results of continuous queries under diverse window specifications

Information Systems
Experience in extending query engine for continuous analytics

DaWaK'10 Proceedings of the 12th international conference on Data warehousing and knowledge discovery
Scale out parallel and distributed CDR stream analytics

Globe'10 Proceedings of the Third international conference on Data management in grid and peer-to-peer systems
Data stream analytics as cloud service for mobile applications

OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems: Part II
Continuous mapreduce for In-DB stream analytics

OTM'10 Proceedings of the 2010 international conference on On the move to meaningful internet systems
Experience in Continuous analytics as a Service (CaaaS)

Proceedings of the 14th International Conference on Extending Database Technology
Query engine grid for executing SQL streaming process

Globe'11 Proceedings of the 4th international conference on Data management in grid and peer-to-peer systems
Continuous access to cloud event services with event pipe queries

OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems - Volume Part II
Extend core UDF framework for GPU-enabled analytical query evaluation

Proceedings of the 15th Symposium on International Database Engineering & Applications
The database architectures research group at CWI

ACM SIGMOD Record
Stream-join revisited in the context of epoch-based SQL continuous query

Proceedings of the 16th International Database Engineering & Applications Sysmposium
MonetDB/DataCell: online analytics in a streaming column-store

Proceedings of the VLDB Endowment
A new paradigm for collaborating distributed query engines

DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
Enhanced stream processing in a DBMS kernel

Proceedings of the 16th International Conference on Extending Database Technology
Database support for processing complex aggregate queries over data streams

Proceedings of the Joint EDBT/ICDT 2013 Workshops

Quantified Score

Hi-index	0.00

Visualization

Abstract

Stream applications gained significant popularity over the last years that lead to the development of specialized stream engines. These systems are designed from scratch with a different philosophy than nowadays database engines in order to cope with the stream applications requirements. However, this means that they lack the power and sophisticated techniques of a full fledged database system that exploits techniques and algorithms accumulated over many years of database research. In this paper, we take the opposite route and design a stream engine directly on top of a database kernel. Incoming tuples are directly stored upon arrival in a new kind of system tables, called baskets. A continuous query can then be evaluated over its relevant baskets as a typical one-time query exploiting the power of the relational engine. Once a tuple has been seen by all relevant queries/operators, it is dropped from its basket. A basket can be the input to a single or multiple similar query plans. Furthermore, a query plan can be split into multiple parts each one with its own input/output baskets allowing for flexible load sharing query scheduling. Contrary to traditional stream engines, that process one tuple at a time, this model allows batch processing of tuples, e.g., query a basket only after x tuples arrive or after a time threshold has passed. Furthermore, we are not restricted to process tuples in the order they arrive. Instead, we can selectively pick tuples from a basket based on the query requirements exploiting a novel query component, the basket expressions. We investigate the opportunities and challenges that arise with such a direction and we show that it carries significant advantages. We propose a complete architecture, the DataCell, which we implemented on top of an open-source column-oriented DBMS. A detailed analysis and experimental evaluation of the core algorithms using both micro benchmarks and the standard Linear Road benchmark demonstrate the potential of this new approach.