Distributed databases principles and systems
Distributed databases principles and systems
Composite semijoins in distributed query processing
Information Sciences: an International Journal
Domain-specific semijoin: a new operation for distributed query processing
Information Sciences: an International Journal
PERF join: an alternative to two-way semijoin and bloomjoin
CIKM '95 Proceedings of the fourth international conference on Information and knowledge management
Principles of distributed database systems (2nd ed.)
Principles of distributed database systems (2nd ed.)
Using Semi-Joins to Solve Relational Queries
Journal of the ACM (JACM)
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Continuously adaptive continuous queries over streams
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Distributed streams algorithms for sliding windows
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
A Pipeline N-Way Join Algorithm Based on the 2-Way Semijoin Program
IEEE Transactions on Knowledge and Data Engineering
A Parallel Execution Method for Minimizing Distributed Query Response Time
IEEE Transactions on Parallel and Distributed Systems
Combining Joint and Semi-Join Operations for Distributed Query Processing
IEEE Transactions on Knowledge and Data Engineering
Using 2-way Semijoins in Distributed Query Processing
Proceedings of the Third International Conference on Data Engineering
Heuristic and randomized optimization for the join ordering problem
The VLDB Journal — The International Journal on Very Large Data Bases
Approximate join processing over data streams
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Adaptive filters for continuous queries over distributed data streams
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Gigascope: a stream database for network applications
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Aurora: a new model and architecture for data stream management
The VLDB Journal — The International Journal on Very Large Data Bases
Characterizing memory requirements for queries over continuous data streams
ACM Transactions on Database Systems (TODS)
Adaptive ordering of pipelined stream filters
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Adaptive Caching for Continuous Queries
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Operator placement for in-network stream query processing
Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Resource-Aware Distributed Stream Management Using Dynamic Overlays
ICDCS '05 Proceedings of the 25th IEEE International Conference on Distributed Computing Systems
Distributed Stream Management using Utility-Driven Self-Adaptive Middleware
ICAC '05 Proceedings of the Second International Conference on Automatic Computing
Optimizing continuous multijoin queries over distributed streams
Proceedings of the 14th ACM international conference on Information and knowledge management
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
SlidingWindow based Multi-Join Algorithms over Distributed Data Streams
ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Optimizing Multiple Queries in Distributed Data Stream Systems
ICDEW '06 Proceedings of the 22nd International Conference on Data Engineering Workshops
Communication-efficient distributed monitoring of thresholded counts
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
A geometric approach to monitoring threshold functions over distributed data streams
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Adaptive Control of Extreme-scale Stream Processing Systems
ICDCS '06 Proceedings of the 26th IEEE International Conference on Distributed Computing Systems
Fault-Tolerant Distributed Stream Processing System
DEXA '06 Proceedings of the 17th International Conference on Database and Expert Systems Applications
State-slice: new paradigm of multi-query optimization of window-based stream queries
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Incremental Evaluation of Sliding-Window Queries over Data Streams
IEEE Transactions on Knowledge and Data Engineering
Why Not Semijoins for Streams, When Distributed?
ICDT '07 Proceedings of the Second International Conference on Digital Telecommunications
Optimization Algorithms for Distributed Queries
IEEE Transactions on Software Engineering
Maximizing the output rate of multi-way join queries over streaming information sources
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Processing sliding window multi-joins in continuous queries over data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Distributed set-expression cardinality estimation
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Processing frequent items over distributed data streams
APWeb'05 Proceedings of the 7th Asia-Pacific web conference on Web Technologies Research and Development
Distributed resource allocation for stream data processing
HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
PMJoin: optimizing distributed multi-way stream joins by stream partitioning
DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
Filtering duplicate items over distributed data streams
WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
Distributed Adaptive Windowed Stream Join Processing
International Journal of Distributed Systems and Technologies
Hi-index | 0.00 |
This paper addresses the distributed stream processing of window-based multi-way join queries considering the semijoin as a key join operator. In distributed stream processing, data streams arriving at remote sites need to be shipped to the processing site for query execution. This typically introduces high communication overhead. Our observation is that semijoin, effective in reducing communication overhead in distributed database query processing, can be also effective in distributed stream query processing. The challenge, however, lies in the streaming nature of the tuples, as it requires continuous and incremental processing of an unbounded sequence of tuples instead of one-time processing of a set of stored tuples. This paper describes our comprehensive work done to address the challenge. Specifically, we first propose a distributed stream join processing model that handles the issue of network delays introduced from the shipment of data streams, and allows for efficient batch processing. Then, based on the model, we propose join algorithms in a multi-way join case: first, one-way join algorithms for different combinations of join placement and join method and, then, multi-way join algorithms assuming linear join ordering. Regarding the join method, two distributed join methods are introduced: (1) simple join, in which full tuples are forwarded to the query processing site and (2) semijoin-based join, in which partial tuples are forwarded. A semijoin-based join can be executed with different possible semijoin strategies which incur different communication overheads. We present a complete set of join algorithms considering all possible semijoin strategies, and propose an optimization algorithm. The join algorithms are executed continuously in an incremental manner as tuples arrive, and never ship tuples redundantly. The optimization algorithm constructs an efficient multi-way join plan by using a greedy heuristic which adds to the plan one stream with the minimum join execution cost in each step. Through extensive experiments, we conduct comparative studies of the performance among the proposed one-way join algorithms and the efficiency of the generated plan between the optimization algorithm based on the greedy heuristic and the exhaustive search, respectively.