Memory-constrained aggregate computation over data streams

  • Authors:
  • K. V. M. Naidu;Rajeev Rastogi;Scott Satkin;Anand Srinivasan

  • Affiliations:
  • Yahoo! Labs Bangalore, India;Yahoo! Labs Bangalore, India;CMU, USA;Google Inc., India

  • Venue:
  • ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we study the problem of efficiently computing multiple aggregation queries over a data stream. In order to share computation, prior proposals have suggested instantiating certain intermediate aggregates which are then used to generate the final answers for input queries. In this work, we make a number of important contributions aimed at improving the execution and generation of query plans containing intermediate aggregates. These include: (1) a different hashing model, which has low eviction rates, and also allows us to accurately estimate the number of evictions, (2) a comprehensive query execution cost model based on these estimates, (3) an efficient greedy heuristic for constructing good low-cost query plans, (4) provably near-optimal and optimal algorithms for allocating the available memory to aggregates in the query plan when the input data distribution is Zipf-like and Uniform, respectively, and (5) a detailed performance study with real-life IP flow data sets, which show that our multiple aggregates computation techniques consistently outperform the best-known approach.