An adaptive query execution system for data integration
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Dataflow query execution in a parallel main-memory environment
PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
TelegraphCQ: continuous dataflow processing
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Early hash join: a configurable algorithm for the efficient and early production of join results
VLDB '05 Proceedings of the 31st international conference on Very large data bases
High-performance complex event processing over streams
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
The Long Tail: Why the Future of Business Is Selling Less of More
The Long Tail: Why the Future of Business Is Selling Less of More
Meshing Streaming Updates with Persistent Data in an Active Data Warehouse
IEEE Transactions on Knowledge and Data Engineering
An Event-Based Near Real-Time Data Integration Architecture
EDOCW '08 Proceedings of the 2008 12th Enterprise Distributed Object Computing Conference Workshops
Scheduling Updates in a Real-Time Stream Warehouse
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Stream warehousing with DataDepot
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Scheduling to minimize staleness and stretch in real-time data warehouses
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
R-MESHJOIN for near-real-time data warehousing
DOLAP '10 Proceedings of the ACM 13th international workshop on Data warehousing and OLAP
Semi-Streamed Index Join for near-real time execution of ETL transformations
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Hi-index | 0.00 |
Recently, a number of semi-stream join algorithms have been published. The typical system setup for these consists of one fast stream input that has to be joined with a disk-based relation R. These semi-stream join approaches typically perform the join with a limited main memory partition assigned to them, which is generally not large enough to hold the whole relation R. We propose a caching approach that can be used as a front-stage for different semi-stream join algorithms, resulting in significant performance gains for common applications. We analyze our approach in the context of a seminal semi-stream join, MESHJOIN (Mesh Join), and provide a cost model for the resulting semi-stream join algorithm, which we call CMESHJOIN (Cached Mesh Join). The algorithm takes advantage of skewed distributions; this article presents results for Zipfian distributions of the type that appears in many applications.