Optimised X-HYBRIDJOIN for near-real-time data warehousing

Authors:
M. Asif Naeem;Gillian Dobbie;Gerald Weber
Affiliations:
The University of Auckland, Auckland, New Zealand;The University of Auckland, Auckland, New Zealand;The University of Auckland, Auckland, New Zealand
Venue:
ADC '12 Proceedings of the Twenty-Third Australasian Database Conference - Volume 124
Year:
2012

Citing 18
Cited 0

An adaptive query execution system for data integration

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Eddies: continuously adaptive query processing

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
NiagaraCQ: a scalable continuous query system for Internet databases

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Dataflow query execution in a parallel main-memory environment

PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
Optimization of parallel query execution plans in XPRS

PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
Processing complex aggregate queries over data streams

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Maintaining variance and k-medians over data stream windows

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Characterizing memory requirements for queries over continuous data streams

ACM Transactions on Database Systems (TODS)
Hash-Merge Join: A Non-blocking Join Algorithm for Producing Fast and Early Join Results

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Early hash join: a configurable algorithm for the efficient and early production of join results

VLDB '05 Proceedings of the 31st international conference on Very large data bases
The Long Tail: Why the Future of Business Is Selling Less of More

The Long Tail: Why the Future of Business Is Selling Less of More
Streaming queries over streaming data

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Maximizing the output rate of multi-way join queries over streaming information sources

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Meshing Streaming Updates with Persistent Data in an Active Data Warehouse

IEEE Transactions on Knowledge and Data Engineering
A partition-based approach to support streaming updates over persistent data in an active datawarehouse

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
R-MESHJOIN for near-real-time data warehousing

DOLAP '10 Proceedings of the ACM 13th international workshop on Data warehousing and OLAP
X-HYBRIDJOIN for near-real-time data warehousing

BNCOD'11 Proceedings of the 28th British national conference on Advances in databases
HYBRIDJOIN for Near-Real-Time Data Warehousing

International Journal of Data Warehousing and Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Stream-based join algorithms are needed in modern near-real-time data warehouses. A particular class of stream-based join algorithms, with MESHJOIN as a typical example, computes the join between a stream and a disk-based relation. Recently we have presented a new algorithm X-HYBRIDJOIN (Extended Hybrid Join) in that class. X-HYBRIDJOIN achieves better performance compared to earlier algorithms by pinning frequently accessed data from the disk-based relation in main memory. Apart from being held in main memory, X-HYBRIDJOIN treats this frequently accessed data no differently than other data from the disk-based relation. In this paper we investigate whether performance can be improved by treating the frequently accessed data differently. We present a new algorithm called Optimised X-HYBRIDJOIN, which consists of two phases. One phase, called the stream-probing phase, deals with the frequently accessed part of the disk-based relation. The other one is called the disk-probing phase and deals with the other part of the disk-based relation. In experiments we found that the performance of Optimised X-HYBRIDJOIN is significantly better than the performance of X-HYBRIDJOIN. We derive the cost model for our algorithm, which allows us to tune the components of Optimised X-HYBRIDJOIN. We performed an experimental study and we validate the cost model against the experimental results.