HYBRIDJOIN for Near-Real-Time Data Warehousing

Authors:
Gillian Dobbie;M. Asif Naeem;Gerald Weber
Affiliations:
The University of Auckland, New Zealand;The University of Auckland, New Zealand;The University of Auckland, New Zealand
Venue:
International Journal of Data Warehousing and Mining
Year:
2011

Citing 18
Cited 3

View maintenance in a warehousing environment

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
An adaptive query execution system for data integration

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Maintenance of materialized views: problems, techniques, and applications

Materialized views
Efficient resumption of interrupted warehouse loads

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Database Management Systems

Database Management Systems
Performance Issues in Incremental Warehouse Maintenance

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Efficient Snapshot Differential Algorithms for Data Warehousing

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Hash-Merge Join: A Non-blocking Join Algorithm for Producing Fast and Early Join Results

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
ETL queues for active data warehousing

Proceedings of the 2nd international workshop on Information quality in information systems
Early hash join: a configurable algorithm for the efficient and early production of join results

VLDB '05 Proceedings of the 31st international conference on Very large data bases
The Long Tail: Why the Future of Business Is Selling Less of More

The Long Tail: Why the Future of Business Is Selling Less of More
Processing sliding window multi-joins in continuous queries over data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Partition-based workload scheduling in living data warehouse environments

Proceedings of the ACM tenth international workshop on Data warehousing and OLAP
Query processing of multi-way stream window joins

The VLDB Journal — The International Journal on Very Large Data Bases
Meshing Streaming Updates with Persistent Data in an Active Data Warehouse

IEEE Transactions on Knowledge and Data Engineering
An Event-Based Near Real-Time Data Integration Architecture

EDOCW '08 Proceedings of the 2008 12th Enterprise Distributed Object Computing Conference Workshops
A partition-based approach to support streaming updates over persistent data in an active datawarehouse

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing

Resource optimization for processing of stream data in data warehouse environment

Proceedings of the International Conference on Advances in Computing, Communications and Informatics
Towards benchmarking stream data warehouses

Proceedings of the fifteenth international workshop on Data warehousing and OLAP
Optimised X-HYBRIDJOIN for near-real-time data warehousing

ADC '12 Proceedings of the Twenty-Third Australasian Database Conference - Volume 124

Quantified Score

Hi-index	0.00

Visualization

Abstract

An important component of near-real-time data warehouses is the near-real-time integration layer. One important element in near-real-time data integration is the join of a continuous input data stream with a disk-based relation. For high-throughput streams, stream-based algorithms, such as Mesh Join MESHJOIN, can be used. However, in MESHJOIN the performance of the algorithm is inversely proportional to the size of disk-based relation. The Index Nested Loop Join INLJ can be set up so that it processes stream input, and can deal with intermittences in the update stream but it has low throughput. This paper introduces a robust stream-based join algorithm called Hybrid Join HYBRIDJOIN, which combines the two approaches. A theoretical result shows that HYBRIDJOIN is asymptotically as fast as the fastest of both algorithms. The authors present performance measurements of the implementation. In experiments using synthetic data based on a Zipfian distribution, HYBRIDJOIN performs significantly better for typical parameters of the Zipfian distribution, and in general performs in accordance with the theoretical model while the other two algorithms are unacceptably slow under different settings.