A framework for estimating complex probability density structures in data streams

Authors:
Arnold P. Boedihardjo;Chang-Tien Lu;Feng Chen
Affiliations:
Virginia Tech, Falls Church, VA, USA;Virginia Tech, Falls Church, VA, USA;Virginia Tech, Falls Church, VA, USA
Venue:
Proceedings of the 17th ACM conference on Information and knowledge management
Year:
2008

Citing 12
Cited 2

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Fast density estimation using CF-kernel for very large databases

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Sampling from a moving window over streaming data

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Fast incremental maintenance of approximate histograms

ACM Transactions on Database Systems (TODS)
M-Kernel Merging: Towards Density Estimation over Data Streams

DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
A framework for diagnosing changes in evolving data streams

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Streaming-Data Algorithms for High-Quality Clustering

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Approximation and streaming algorithms for histogram construction problems

ACM Transactions on Database Systems (TODS)
Online outlier detection in sensor data using non-parametric models

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
How to summarize the universe: dynamic maintenance of quantiles

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
The history of histograms (abridged)

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Message family propagation for ising mean field based on iteration tree

Proceedings of the 18th ACM conference on Information and knowledge management
Online wavelet-based density estimation for non-stationary streaming data

Computational Statistics & Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Probability density function estimation is a fundamental component in several stream mining tasks such as outlier detection and classification. The nonparametric adaptive kernel density estimate (AKDE) provides a robust and asymptotically consistent estimate for an arbitrary distribution. However, its extensive computational requirements make it difficult to apply this technique to the stream environment. This paper tackles the issue of developing efficient and asymptotically consistent AKDE over data streams while heeding the stringent constraints imposed by the stream environment. We propose the concept of local regions to effectively synopsize local density features, design a suite of algorithms to maintain the AKDE under a time-based sliding window, and analyze the estimates' asymptotic consistency and computational costs. In addition, extensive experiments were conducted with real-world and synthetic data sets to demonstrate the effectiveness and efficiency of our approach.