Range-Efficient Computation of F" over Massive Data Streams

  • Authors:
  • A. Pavan;Srikanta Tirthapura

  • Affiliations:
  • Iowa State University;Iowa State University

  • Venue:
  • ICDE '05 Proceedings of the 21st International Conference on Data Engineering
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Efficient one-pass computation of F驴, the number of distinct elements in a data stream, is a fundamental problem arising in various contexts in databases and networking. We consider the problem of efficiently estimating F驴 of a data stream where each element of the stream is an interval of integers. We present a randomized algorithm which gives an (\varepsilon ,\delta ) approximation of F驴, with the following time complexity (n is the size of the universe of the items): (1) The amortized processing time per interval is 0(\log \frac{1}{\delta }Log\frac{n}{\varepsilon }). (2) The time to answer a query for F驴 is 0(\log {1 \mathord{\left/ {\vphantom {1 {\delta )}}} \right. \kern-\nulldelimiterspace} {\delta )}}. The workspace used is 0(\frac{1}{{\varepsilon ^2 }}\log \frac{1}{\delta }\log n) bits. Our algorithm improves upon a previous algorithm by Bar-Yossef, Kumar and Sivakumar [5], which requires 0(\frac{1}{{\varepsilon ^5 }}\log \frac{1}{\delta }\log ^5 n) processing time per item. Our algorithm can be used to compute the max-dominance norm of a stream of multiple signals, and significantly improves upon the current best bounds due to Cormode and Muthukrishnan [11]. This also provides efficient and novel solutions for data aggregation problems in sensor networks studied by Nath and Gibbons [22] and Considine et. al. [8].