Optimised X-HYBRIDJOIN for near-real-time data warehousing

  • Authors:
  • M. Asif Naeem;Gillian Dobbie;Gerald Weber

  • Affiliations:
  • The University of Auckland, Auckland, New Zealand;The University of Auckland, Auckland, New Zealand;The University of Auckland, Auckland, New Zealand

  • Venue:
  • ADC '12 Proceedings of the Twenty-Third Australasian Database Conference - Volume 124
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Stream-based join algorithms are needed in modern near-real-time data warehouses. A particular class of stream-based join algorithms, with MESHJOIN as a typical example, computes the join between a stream and a disk-based relation. Recently we have presented a new algorithm X-HYBRIDJOIN (Extended Hybrid Join) in that class. X-HYBRIDJOIN achieves better performance compared to earlier algorithms by pinning frequently accessed data from the disk-based relation in main memory. Apart from being held in main memory, X-HYBRIDJOIN treats this frequently accessed data no differently than other data from the disk-based relation. In this paper we investigate whether performance can be improved by treating the frequently accessed data differently. We present a new algorithm called Optimised X-HYBRIDJOIN, which consists of two phases. One phase, called the stream-probing phase, deals with the frequently accessed part of the disk-based relation. The other one is called the disk-probing phase and deals with the other part of the disk-based relation. In experiments we found that the performance of Optimised X-HYBRIDJOIN is significantly better than the performance of X-HYBRIDJOIN. We derive the cost model for our algorithm, which allows us to tune the components of Optimised X-HYBRIDJOIN. We performed an experimental study and we validate the cost model against the experimental results.