Multiple aggregations over data streams

  • Authors:
  • Rui Zhang;Nick Koudas;Beng Chin Ooi;Divesh Srivastava

  • Affiliations:
  • National Univ. of Singapore;Univ. of Torontó;National Univ. of Singapore;AT&T Labs-Research

  • Venue:
  • Proceedings of the 2005 ACM SIGMOD international conference on Management of data
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Monitoring aggregates on IP traffic data streams is a compelling application for data stream management systems. The need for exploratory IP traffic data analysis naturally leads to posing related aggregation queries on data streams, that differ only in the choice of grouping attributes. In this paper, we address this problem of efficiently computing multiple aggregations over high speed data streams, based on a two-level LFTA/HFTA DSMS architecture, inspired by Gigascope.Our first contribution is the insight that in such a scenario, additionally computing and maintaining fine-granularity aggregation queries (phantoms) at the LFTA has the benefit of supporting shared computation. Our second contribution is an investigation into the problem of identifying beneficial LFTA configurations of phantoms and user-queries. We formulate this problem as a cost optimization problem, which consists of two sub-optimization problems: how to choose phantoms and how to allocate space for them in the LFTA. We formally show the hardness of determining the optimal configuration, and propose cost greedy heuristics for these independent sub-problems based on detailed analyses. Our final contribution is a thorough experimental study, based on real IP traffic data, as well as synthetic data, to demonstrate the effectiveness of our techniques for identifying beneficial configurations.