MJoin: a metadata-aware stream join operator

Authors:
Luping Ding;Elke A. Rundensteiner;George T. Heineman
Affiliations:
Worcester Polytechnic Institute, Worcester, MA;Worcester Polytechnic Institute, Worcester, MA;Worcester Polytechnic Institute, Worcester, MA
Venue:
Proceedings of the 2nd international workshop on Distributed event-based systems
Year:
2003

Citing 7
Cited 6

Dataflow query execution in a parallel main-memory environment

Distributed and Parallel Databases - Selected papers from the first international conference on parallel and distributed information systems
An adaptive query execution system for data integration

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
NiagaraCQ: a scalable continuous query system for Internet databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Database Systems: The Complete Book

Database Systems: The Complete Book
Implementation of Two Semantic Query Optimization Techniques in DB2 Universal Database

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Streaming queries over streaming data

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Monitoring streams: a new class of data management applications

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Early hash join: a configurable algorithm for the efficient and early production of join results

VLDB '05 Proceedings of the 31st international conference on Very large data bases
CAPE: continuous query engine with heterogeneous-grained adaptivity

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Runtime-Efficient Approach for Multiple Continuous Filtering in XML Message Brokers

SOFSEM '07 Proceedings of the 33rd conference on Current Trends in Theory and Practice of Computer Science
Selectivity-sensitive shared evaluation of multiple continuous XPath queries over XML streams

Information Sciences: an International Journal
Consistent collective evaluation of multiple continuous queries for filtering heterogeneous data streams

Knowledge and Information Systems
Stream schema: providing and exploiting static metadata for data stream processing

Proceedings of the 13th International Conference on Extending Database Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

Join algorithms must be re-designed when processing stream data instead of persistently stored data. Data streams are potentially infinite and the query result is expected to be generated incrementally instead of once only. Data arrival patterns are often unpredictable and the statistics of the data and other relevant metadata often are only known at runtime. In some cases they are supplied interleaved with the actual data in the form of stream markers. Recently, stream join algorithms, like Symmetric Hash Join and XJoin, have been designed to perform in a pipelined fashion to cope with the latent delivery of data. However, none of them to date takes metadata, especially runtime metadata, into consideration. Hence, the join execution logic defined statically before runtime may not be well suited to deal with varying types of dynamic runtime scenarios. Also the potentially unbounded state needs to be maintained by the join operator to guarantee the precision of the result. In this paper, we propose a metadata-aware stream join operator called MJoin which is able to exploit metadata to (1) detect and purge useless materialized data to save computation resources and (2) optimize the execution logic to target diferent optimization goals. We have implemented the MJoin operator. The experimental results validate our metadata-driven join optimization strategies.