State-slice: new paradigm of multi-query optimization of window-based stream queries

  • Authors:
  • Song Wang;Elke Rundensteiner;Samrat Ganguly;Sudeept Bhatnagar

  • Affiliations:
  • Worcester Polytechnic Institute, Worcester, MA;Worcester Polytechnic Institute, Worcester, MA;NEC Laboratories America Inc., Princeton, NJ;NEC Laboratories America Inc., Princeton, NJ

  • Venue:
  • VLDB '06 Proceedings of the 32nd international conference on Very large data bases
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Modern stream applications such as sensor monitoring systems and publish/subscription services necessitate the handling of large numbers of continuous queries specified over high volume data streams. Efficient sharing of computations among multiple continuous queries, especially for the memory- and CPU-intensive window-based operations, is critical. A novel challenge in this scenario is to allow resource sharing among similar queries, even if they employ windows of different lengths. This paper first reviews the existing sharing methods in the literature, and then illustrates the significant performance shortcomings of these methods.This paper then presents a novel paradigm for the sharing of window join queries. Namely we slice window states of a join operator into fine-grained window slices and form a chain of sliced window joins. By using an elaborate pipelining methodology, the number of joins after state slicing is reduced from quadratic to linear. This novel sharing paradigm enables us to push selections down into the chain and flexibly select subsequences of such sliced window joins for computation sharing among queries with different window sizes. Based on the state-slice sharing paradigm, two algorithms are proposed for the chain buildup. One minimizes the memory consumption while the other minimizes the CPU usage. The algorithms are proven to find the optimal chain with respect to memory or CPU usage for a given query workload. We have implemented the slice-share paradigm within the data stream management system CAPE. The experimental results show that our strategy provides the best performance over a diverse range of workload settings among all alternate solutions in the literature.