R-MESHJOIN for near-real-time data warehousing

Authors:
M. Asif Naeem;Gillian Dobbie;Gerald Weber;Shafiq Alam
Affiliations:
The University of Auckland, Auckland, New Zealand;The University of Auckland, Auckland, New Zealand;The University of Auckland, Auckland, New Zealand;The University of Auckland, Auckland, New Zealand
Venue:
DOLAP '10 Proceedings of the ACM 13th international workshop on Data warehousing and OLAP
Year:
2010

Citing 11
Cited 8

Join processing in database systems with large main memories

ACM Transactions on Database Systems (TODS)
An adaptive query execution system for data integration

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Efficient resumption of interrupted warehouse loads

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Dataflow query execution in a parallel main-memory environment

PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
Performance Issues in Incremental Warehouse Maintenance

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Potter's Wheel: An Interactive Data Cleaning System

Proceedings of the 27th International Conference on Very Large Data Bases
Hash-Merge Join: A Non-blocking Join Algorithm for Producing Fast and Early Join Results

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
ETL queues for active data warehousing

Proceedings of the 2nd international workshop on Information quality in information systems
Early hash join: a configurable algorithm for the efficient and early production of join results

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Meshing Streaming Updates with Persistent Data in an Active Data Warehouse

IEEE Transactions on Knowledge and Data Engineering
An Event-Based Near Real-Time Data Integration Architecture

EDOCW '08 Proceedings of the 2008 12th Enterprise Distributed Object Computing Conference Workshops

DOLAP 2010 workshop summary

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
X-HYBRIDJOIN for near-real-time data warehousing

BNCOD'11 Proceedings of the 28th British national conference on Advances in databases
Towards benchmarking stream data warehouses

Proceedings of the fifteenth international workshop on Data warehousing and OLAP
A lightweight stream-based join with limited resource consumption

DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
Continuous query processing with concurrency control: reading updatable resources consistently

Proceedings of the 28th Annual ACM Symposium on Applied Computing
Optimised X-HYBRIDJOIN for near-real-time data warehousing

ADC '12 Proceedings of the Twenty-Third Australasian Database Conference - Volume 124
Data stream processing with concurrency control

ACM SIGAPP Applied Computing Review
A generic front-stage for semi-stream processing

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

To fulfill the increasing demand of business for the latest information, current data integration approaches are moving towards real-time updates. One important element in real-time data integration is the join of a continuous incoming data stream with a disk-based relation. In this paper we investigate a stream-based join algorithm, called mesh join (MESHJOIN), and propose an improved version called reduced MESHJOIN (R-MESHJOIN). Both algorithms tune the memory, allocating parts of the memory to key components. In MESHJOIN there is a dependency between the size of partitions in an internal queue for the stream data and the number of iterations required to bring the disk-based relation into memory. This dependency hampers the optimal distribution of memory among the join components. In particular the size of the disk-buffer varies with the size of the disk-based relation which is unnecessary. On the other hand the R-MESHJOIN algorithm removes this dependency. This enables an optimal distribution of available memory among the join components. In R-MESHJOIN a change in the size of the disk-based relation does not affect the size of the disk-buffer. An experimental study is conducted in order to validate the arguments.