Exploring Data Streams with Nonparametric Estimators

  • Authors:
  • Christoph Heinz;Bernhard Seeger

  • Affiliations:
  • Philipps University Marburg, Germany;Philipps University Marburg, Germany

  • Venue:
  • SSDBM '06 Proceedings of the 18th International Conference on Scientific and Statistical Database Management
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

A variety of real-world applications requires a meaningful online analysis of transient data streams. An important building block of many analysis tasks is the characterization of the underlying data distribution. Sophisticated techniques from the area of nonparametric statistics provide a well-defined estimation of continuous data distributions. The analysis of data streams may gain advantage of these techniques, however, the rigid processing requirements of streams render a direct application impossible. In our work, we tackle the adaptation of nonparametric techniques to streaming data. We concentrate on density estimation as it provides a convenient basis for the exploration of an unknown continuous data distribution. Specifically, we have developed kernel- and wavelet-based density estimators for data streams in compliance with their processing requirements. Both techniques are incorporated into PIPES, our Java library for advanced data stream processing and analysis. In the demonstration, we will present our nonparametric density estimators over data streams and show their performance for a variety of heterogeneous data streams from different real-world application scenarios. We will also present the implementation of further analysis tasks on top of our estimators by means of illustrative use cases.