Space-efficient Relative Error Order Sketch over Data Streams

  • Authors:
  • Ying Zhang;Xuemin Lin;Jian Xu;Flip Korn;Wei Wang

  • Affiliations:
  • University of New South Wales;University of New South Wales;University of New South Wales;AT&T Labs-Research;AT&T Labs-Research

  • Venue:
  • ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the problem of continuously maintaining order sketches over data streams with a relative rank error guarantee ∊. Novel space-efficient and one-scan randomised techniques are developed. Our first randomised algorithm can guarantee such a relative error precision ∊ with confidence 1 - \deltausing O( 1\_ \in \frac{1} {2}2 log 1d log ∊^2N) space, where N is the number of data elements seen so far in a data stream. Then, a new one-scan space compression technique is developed. Combined with the first randomised algorithm, the one-scan space compression technique yields another one-scan randomised algorithm that guarantees the space requirement is O( 1\frac{1} { \in } log(1\frac{1}{ \in } log 1\begin{gathered} \frac{1}{\delta } \hfill \\ \hfill \\ \end{gathered})\frac{{\log ^{2 + \alpha }\in N}} {{1 - 1/2^\alpha}}(for\alpha \gt 0) on average while the worst case space remains O( \frac{1}{{ \in ^2 }}\log \frac{1} {\delta }\log\in ^2 N). These results are immediately applicable to approximately computing quantiles over data streams with a relative error guarantee\inand significantly improve the previous best space bound O( \frac{1} {{ \in ^3 }}\log \frac{1}{\delta }\log N). Our extensive experiment results demonstrate that both techniques can support an on-line computation against high speed data streams.