Fast computation of approximate biased histograms on sliding windows over data streams

  • Authors:
  • Hamid Mousavi;Carlo Zaniolo

  • Affiliations:
  • UCLA, Los Angeles;UCLA, Los Angeles

  • Venue:
  • Proceedings of the 25th International Conference on Scientific and Statistical Database Management
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Histograms provide effective synopses of large data sets, and are thus used in a wide variety of applications, including query optimization, approximate query answering, distribution fitting, parallel database partitioning, and data mining. Moreover, very fast approximate algorithms are needed to compute accurate histograms on fast-arriving data streams, whereby online queries can be supported within the given memory and computing resources. Many real-life applications require that the data distribution in certain regions must be modeled with greater accuracy, and Biased Histograms are designed to address this need. In this paper, we define biased histograms over data streams and sliding windows on data streams, and propose the Bar Splitting Biased Histogram (BSBH) algorithm to construct them efficiently and accurately. We prove that BSBH generates expected ∈-approximate biased histograms for data streams with stationary distributions, and our experiments show that BSBH also achieves good approximation in the presence of concept shifts, even major ones. Additionally, BSBH employs a new biased sampling technique which outperforms uniform sampling in terms of accuracy, while using about the same amount of time and memory. Therefore, BSBH outperforms previously proposed algorithms for computing biased histograms over the whole data stream, and it is the first algorithm that supports windows.