Distributed stream join query processing with semijoins

Authors:
Tri Minh Tran;Byung Suk Lee
Affiliations:
Department of Computer Science, University of Vermont, Burlington, USA 05405;Department of Computer Science, University of Vermont, Burlington, USA 05405
Venue:
Distributed and Parallel Databases
Year:
2010

Citing 44
Cited 1

Distributed databases principles and systems

Distributed databases principles and systems
Composite semijoins in distributed query processing

Information Sciences: an International Journal
Domain-specific semijoin: a new operation for distributed query processing

Information Sciences: an International Journal
PERF join: an alternative to two-way semijoin and bloomjoin

CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
Principles of distributed database systems (2nd ed.)

Principles of distributed database systems (2nd ed.)
Using Semi-Joins to Solve Relational Queries

Journal of the ACM (JACM)
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Continuously adaptive continuous queries over streams

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Distributed streams algorithms for sliding windows

Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
A Pipeline N-Way Join Algorithm Based on the 2-Way Semijoin Program

IEEE Transactions on Knowledge and Data Engineering
A Parallel Execution Method for Minimizing Distributed Query Response Time

IEEE Transactions on Parallel and Distributed Systems
Combining Joint and Semi-Join Operations for Distributed Query Processing

IEEE Transactions on Knowledge and Data Engineering
Using 2-way Semijoins in Distributed Query Processing

Proceedings of the Third International Conference on Data Engineering
Heuristic and randomized optimization for the join ordering problem

The VLDB Journal — The International Journal on Very Large Data Bases
Approximate join processing over data streams

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Adaptive filters for continuous queries over distributed data streams

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Gigascope: a stream database for network applications

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Aurora: a new model and architecture for data stream management

The VLDB Journal — The International Journal on Very Large Data Bases
Characterizing memory requirements for queries over continuous data streams

ACM Transactions on Database Systems (TODS)
Adaptive ordering of pipelined stream filters

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Adaptive Caching for Continuous Queries

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Operator placement for in-network stream query processing

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Resource-Aware Distributed Stream Management Using Dynamic Overlays

ICDCS '05 Proceedings of the 25th IEEE International Conference on Distributed Computing Systems
Distributed Stream Management using Utility-Driven Self-Adaptive Middleware

ICAC '05 Proceedings of the Second International Conference on Automatic Computing
Optimizing continuous multijoin queries over distributed streams

Proceedings of the 14th ACM international conference on Information and knowledge management
What's Different: Distributed, Continuous Monitoring of Duplicate-Resilient Aggregates on Data Streams

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
SlidingWindow based Multi-Join Algorithms over Distributed Data Streams

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Optimizing Multiple Queries in Distributed Data Stream Systems

ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Communication-efficient distributed monitoring of thresholded counts

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
A geometric approach to monitoring threshold functions over distributed data streams

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Adaptive Control of Extreme-scale Stream Processing Systems

ICDCS '06 Proceedings of the 26th IEEE International Conference on Distributed Computing Systems
Fault-Tolerant Distributed Stream Processing System

DEXA '06 Proceedings of the 17th International Conference on Database and Expert Systems Applications
State-slice: new paradigm of multi-query optimization of window-based stream queries

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Analysis of two existing and one new dynamic programming algorithm for the generation of optimal bushy join trees without cross products

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Incremental Evaluation of Sliding-Window Queries over Data Streams

IEEE Transactions on Knowledge and Data Engineering
Why Not Semijoins for Streams, When Distributed?

ICDT '07 Proceedings of the Second International Conference on Digital Telecommunications
Optimization Algorithms for Distributed Queries

IEEE Transactions on Software Engineering
Maximizing the output rate of multi-way join queries over streaming information sources

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Processing sliding window multi-joins in continuous queries over data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Distributed set-expression cardinality estimation

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Processing frequent items over distributed data streams

APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
Distributed resource allocation for stream data processing

HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
PMJoin: optimizing distributed multi-way stream joins by stream partitioning

DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
Filtering duplicate items over distributed data streams

WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management

Distributed Adaptive Windowed Stream Join Processing

International Journal of Distributed Systems and Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper addresses the distributed stream processing of window-based multi-way join queries considering the semijoin as a key join operator. In distributed stream processing, data streams arriving at remote sites need to be shipped to the processing site for query execution. This typically introduces high communication overhead. Our observation is that semijoin, effective in reducing communication overhead in distributed database query processing, can be also effective in distributed stream query processing. The challenge, however, lies in the streaming nature of the tuples, as it requires continuous and incremental processing of an unbounded sequence of tuples instead of one-time processing of a set of stored tuples. This paper describes our comprehensive work done to address the challenge. Specifically, we first propose a distributed stream join processing model that handles the issue of network delays introduced from the shipment of data streams, and allows for efficient batch processing. Then, based on the model, we propose join algorithms in a multi-way join case: first, one-way join algorithms for different combinations of join placement and join method and, then, multi-way join algorithms assuming linear join ordering. Regarding the join method, two distributed join methods are introduced: (1) simple join, in which full tuples are forwarded to the query processing site and (2) semijoin-based join, in which partial tuples are forwarded. A semijoin-based join can be executed with different possible semijoin strategies which incur different communication overheads. We present a complete set of join algorithms considering all possible semijoin strategies, and propose an optimization algorithm. The join algorithms are executed continuously in an incremental manner as tuples arrive, and never ship tuples redundantly. The optimization algorithm constructs an efficient multi-way join plan by using a greedy heuristic which adds to the plan one stream with the minimum join execution cost in each step. Through extensive experiments, we conduct comparative studies of the performance among the proposed one-way join algorithms and the efficiency of the generated plan between the optimization algorithm based on the greedy heuristic and the exhaustive search, respectively.