GrubJoin: An Adaptive, Multi-Way, Windowed Stream Join with Time Correlation-Aware CPU Load Shedding

  • Authors:
  • Bugra Gedik;Kun-Lung Wu;Philip S. Yu;Ling Liu

  • Affiliations:
  • IEEE;IEEE;IEEE;IEEE

  • Venue:
  • IEEE Transactions on Knowledge and Data Engineering
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Tuple dropping, though commonly used for loadshedding in most data stream operations, is generally inadequatefor multi-way, windowed stream joins. The join output rate canbe unnecessarily reduced because tuple dropping fails to exploitthe time correlations likely to exist among interrelated streams.In this paper, we introduce GrubJoin - an adaptive, multi-way,windowed stream join that effectively performs time correlationawareCPU load shedding. GrubJoin maximizes the output rateby achieving near-optimal window harvesting, which picks onlythe most profitable segments of individual windows for the join.Due mainly to the combinatorial explosion of possible multi-wayjoin sequences involving different window segments, GrubJoinfaces unique challenges that do not exist for binary joins, suchas determining the optimal window harvesting configurationin a time efficient manner and learning the time correlationsamong the streams without introducing overhead. To tacklethese challenges, we formalize window harvesting as an optimizationproblem, develop greedy heuristics to determine nearoptimalwindow harvesting configurations and use approximationtechniques to capture the time correlations. Our experimentalresults show that GrubJoin is vastly superior to tuple droppingwhen time correlations exist and is equally effective when timecorrelations are nonexistent.