An adaptive query execution system for data integration
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Eddies: continuously adaptive query processing
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
NiagaraCQ: a scalable continuous query system for Internet databases
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Dataflow query execution in a parallel main-memory environment
PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
Optimization of parallel query execution plans in XPRS
PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
Processing complex aggregate queries over data streams
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Maintaining variance and k-medians over data stream windows
Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Characterizing memory requirements for queries over continuous data streams
ACM Transactions on Database Systems (TODS)
Hash-Merge Join: A Non-blocking Join Algorithm for Producing Fast and Early Join Results
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Early hash join: a configurable algorithm for the efficient and early production of join results
VLDB '05 Proceedings of the 31st international conference on Very large data bases
The Long Tail: Why the Future of Business Is Selling Less of More
The Long Tail: Why the Future of Business Is Selling Less of More
Streaming queries over streaming data
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Maximizing the output rate of multi-way join queries over streaming information sources
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Meshing Streaming Updates with Persistent Data in an Active Data Warehouse
IEEE Transactions on Knowledge and Data Engineering
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
R-MESHJOIN for near-real-time data warehousing
DOLAP '10 Proceedings of the ACM 13th international workshop on Data warehousing and OLAP
X-HYBRIDJOIN for near-real-time data warehousing
BNCOD'11 Proceedings of the 28th British national conference on Advances in databases
HYBRIDJOIN for Near-Real-Time Data Warehousing
International Journal of Data Warehousing and Mining
Hi-index | 0.00 |
Stream-based join algorithms are needed in modern near-real-time data warehouses. A particular class of stream-based join algorithms, with MESHJOIN as a typical example, computes the join between a stream and a disk-based relation. Recently we have presented a new algorithm X-HYBRIDJOIN (Extended Hybrid Join) in that class. X-HYBRIDJOIN achieves better performance compared to earlier algorithms by pinning frequently accessed data from the disk-based relation in main memory. Apart from being held in main memory, X-HYBRIDJOIN treats this frequently accessed data no differently than other data from the disk-based relation. In this paper we investigate whether performance can be improved by treating the frequently accessed data differently. We present a new algorithm called Optimised X-HYBRIDJOIN, which consists of two phases. One phase, called the stream-probing phase, deals with the frequently accessed part of the disk-based relation. The other one is called the disk-probing phase and deals with the other part of the disk-based relation. In experiments we found that the performance of Optimised X-HYBRIDJOIN is significantly better than the performance of X-HYBRIDJOIN. We derive the cost model for our algorithm, which allows us to tune the components of Optimised X-HYBRIDJOIN. We performed an experimental study and we validate the cost model against the experimental results.