Adaptive load diffusion for stream joins

  • Authors:
  • Xiaohui Gu;Philip S. Yu

  • Affiliations:
  • IBM T. J. Watson Research Center, Hawthorne, NY;IBM T. J. Watson Research Center, Hawthorne, NY

  • Venue:
  • Proceedings of the ACM/IFIP/USENIX 2005 International Conference on Middleware
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data stream processing has become increasingly important as many emerging applications call for sophisticated realtime processing over data streams, such as stock trading surveillance, network traffic monitoring, and sensor data analysis. Stream joins are among the most important stream processing operations, which can be used to detect linkages and correlations between different data streams. One major challenge in processing stream joins is to handle continuous, high-volume, and time-varying data streams under resource constraints. In this paper, we present a novel load diffusion system to enable scalable execution of resource-intensive stream joins using an ensemble of server hosts. The load diffusion is achieved by a simple correlation-aware stream partition algorithm. Different from previous work, the load diffusion system can (1) achieve fine-grained load sharing in the distributed stream processing system; and (2) produce exact query answers without missing any join results or generate duplicate join results. Our experimental results show that the load diffusion scheme can greatly improve the system throughput and achieve more balanced load distribution.