Maintaining variance and k-medians over data stream windows

  • Authors:
  • Brain Babcock;Mayur Datar;Rajeev Motwani;Liadan O'Callaghan

  • Affiliations:
  • Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA;Stanford University, Stanford, CA

  • Venue:
  • Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

The sliding window model is useful for discounting stale data in data stream applications. In this model, data elements arrive continually and only the most recent N elements are used when answering queries. We present a novel technique for solving two important and related problems in the sliding window model---maintaining variance and maintaining a k--median clustering. Our solution to the problem of maintaining variance provides a continually updated estimate of the variance of the last N values in a data stream with relative error of at most ε using O(1/ε2 log N) memory. We present a constant-factor approximation algorithm which maintains an approximate k--median solution for the last N data points using O(k/τ4 N2τ log2 N) memory, where τ O(2O(1/τ)).