Granularity adaptive density estimation and on demand clustering of concept-drifting data streams

Authors:
Weiheng Zhu;Jian Pei;Jian Yin;Yihuang Xie
Affiliations:
Zhongshan University, China;Simon Fraser University, Canada;Zhongshan University, China;Zhongshan University, China
Venue:
DaWaK'06 Proceedings of the 8th international conference on Data Warehousing and Knowledge Discovery
Year:
2006

Citing 14
Cited 0

Data clustering: a review

ACM Computing Surveys (CSUR)
Maintaining variance and k-medians over data stream windows

Proceedings of the twenty-second ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Bursty and hierarchical structure in streams

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering data streams

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
A framework for diagnosing changes in evolving data streams

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Efficient elastic burst detection in data streams

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding recent frequent itemsets adaptively over online data streams

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Online novelty detection on temporal sequences

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering moving objects for spatio-temporal selectivity estimation

ADC '04 Proceedings of the 15th Australasian database conference - Volume 27
Clustering moving objects

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering on Demand for Multiple Data Streams

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Load shedding in a data stream manager

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A framework for projected clustering of high dimensional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering data streams has found a few important applications. While many previous studies focus on clustering objects arriving in a data stream, in this paper, we consider the novel problem of on demand clustering concept drifting data streams. In order to characterize concept drifting data streams, we propose an effective method to estimate densities of data streams. One unique feature of our new method is that its granularity of estimation is adaptive to the available computation resource, which is critical for processing data streams of unpredictable input rates. Moreover, we can apply any clustering method to on demand cluster data streams using their density estimations. A performance study on synthetic data sets is reported to verify our design, which clearly shows that our method obtains results comparable to CluStream [3] on clustering single stream, and much better results than COD [8] when clustering multiple streams.