Join processing in database systems with large main memories
ACM Transactions on Database Systems (TODS)
An adaptive query execution system for data integration
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Efficient resumption of interrupted warehouse loads
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Dataflow query execution in a parallel main-memory environment
PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
Performance Issues in Incremental Warehouse Maintenance
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Potter's Wheel: An Interactive Data Cleaning System
Proceedings of the 27th International Conference on Very Large Data Bases
Hash-Merge Join: A Non-blocking Join Algorithm for Producing Fast and Early Join Results
ICDE '04 Proceedings of the 20th International Conference on Data Engineering
ETL queues for active data warehousing
Proceedings of the 2nd international workshop on Information quality in information systems
Early hash join: a configurable algorithm for the efficient and early production of join results
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Meshing Streaming Updates with Persistent Data in an Active Data Warehouse
IEEE Transactions on Knowledge and Data Engineering
An Event-Based Near Real-Time Data Integration Architecture
EDOCW '08 Proceedings of the 2008 12th Enterprise Distributed Object Computing Conference Workshops
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
X-HYBRIDJOIN for near-real-time data warehousing
BNCOD'11 Proceedings of the 28th British national conference on Advances in databases
Towards benchmarking stream data warehouses
Proceedings of the fifteenth international workshop on Data warehousing and OLAP
A lightweight stream-based join with limited resource consumption
DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
Continuous query processing with concurrency control: reading updatable resources consistently
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Optimised X-HYBRIDJOIN for near-real-time data warehousing
ADC '12 Proceedings of the Twenty-Third Australasian Database Conference - Volume 124
Data stream processing with concurrency control
ACM SIGAPP Applied Computing Review
A generic front-stage for semi-stream processing
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Hi-index | 0.00 |
To fulfill the increasing demand of business for the latest information, current data integration approaches are moving towards real-time updates. One important element in real-time data integration is the join of a continuous incoming data stream with a disk-based relation. In this paper we investigate a stream-based join algorithm, called mesh join (MESHJOIN), and propose an improved version called reduced MESHJOIN (R-MESHJOIN). Both algorithms tune the memory, allocating parts of the memory to key components. In MESHJOIN there is a dependency between the size of partitions in an internal queue for the stream data and the number of iterations required to bring the disk-based relation into memory. This dependency hampers the optimal distribution of memory among the join components. In particular the size of the disk-buffer varies with the size of the disk-based relation which is unnecessary. On the other hand the R-MESHJOIN algorithm removes this dependency. This enables an optimal distribution of available memory among the join components. In R-MESHJOIN a change in the size of the disk-based relation does not affect the size of the disk-buffer. An experimental study is conducted in order to validate the arguments.