A generic front-stage for semi-stream processing

Authors:
M. Asif Naeem;Gerald Weber;Gillian Dobbie;Christof Lutteroth
Affiliations:
Auckland University of Technology, Auckland, New Zealand;The University of Auckland, Auckland, New Zealand;The University of Auckland, Auckland, New Zealand;The University of Auckland, Auckland, New Zealand
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 15
Cited 0

An adaptive query execution system for data integration

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Dataflow query execution in a parallel main-memory environment

PDIS '91 Proceedings of the first international conference on Parallel and distributed information systems
TelegraphCQ: continuous dataflow processing

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Early hash join: a configurable algorithm for the efficient and early production of join results

VLDB '05 Proceedings of the 31st international conference on Very large data bases
High-performance complex event processing over streams

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
The Long Tail: Why the Future of Business Is Selling Less of More

The Long Tail: Why the Future of Business Is Selling Less of More
Meshing Streaming Updates with Persistent Data in an Active Data Warehouse

IEEE Transactions on Knowledge and Data Engineering
An Event-Based Near Real-Time Data Integration Architecture

EDOCW '08 Proceedings of the 2008 12th Enterprise Distributed Object Computing Conference Workshops
Scheduling Updates in a Real-Time Stream Warehouse

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Understanding intrinsic characteristics and system implications of flash memory based solid state drives

Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Stream warehousing with DataDepot

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Scheduling to minimize staleness and stretch in real-time data warehouses

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
A partition-based approach to support streaming updates over persistent data in an active datawarehouse

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
R-MESHJOIN for near-real-time data warehousing

DOLAP '10 Proceedings of the ACM 13th international workshop on Data warehousing and OLAP
Semi-Streamed Index Join for near-real time execution of ETL transformations

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recently, a number of semi-stream join algorithms have been published. The typical system setup for these consists of one fast stream input that has to be joined with a disk-based relation R. These semi-stream join approaches typically perform the join with a limited main memory partition assigned to them, which is generally not large enough to hold the whole relation R. We propose a caching approach that can be used as a front-stage for different semi-stream join algorithms, resulting in significant performance gains for common applications. We analyze our approach in the context of a seminal semi-stream join, MESHJOIN (Mesh Join), and provide a cost model for the resulting semi-stream join algorithm, which we call CMESHJOIN (Cached Mesh Join). The algorithm takes advantage of skewed distributions; this article presents results for Zipfian distributions of the type that appears in many applications.