Streaming multiple aggregations using phantoms

  • Authors:
  • Rui Zhang;Nick Koudas;Beng Chin Ooi;Divesh Srivastava;Pu Zhou

  • Affiliations:
  • University of Melbourne, Parkville, Australia;University of Toronto, Toronto, Canada;National University of Singapore, Singapore, Singapore;AT&T Labs---Research, Middletown, USA;University of Melbourne, Parkville, Australia

  • Venue:
  • The VLDB Journal — The International Journal on Very Large Data Bases
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data streams characterize the high speed and large volume input of a new class of applications such as network monitoring, web content analysis and sensor networks. Among these applications, network monitoring may be the most compelling one--the backbone of a large internet service provider can generate 1 petabyte of data per day. For many network monitoring tasks such as traffic analysis and statistics collection, aggregation is a primitive operation. Various analytical and statistical needs naturally lead to related aggregate queries. In this article, we address the problem of efficiently computing multiple aggregations over high-speed data streams based on the two-level query processing architecture of GS, a real data stream management system deployed in AT & T. We discern that additionally computing and maintaining fine-granularity aggregations (called phantoms) has the benefit of supporting shared computation. Based on a thorough analysis, we propose algorithms to identify the best set of phantoms to maintain and determine allocation of resources (particularly, space) to compute the aggregations. Experiments show that our algorithm achieves near-optimal computation costs, which outperforms the best adapted algorithm by more than an order of magnitude.