Clustering over Evolving Data Streams Based on Online Recent-Biased Approximation

Authors:
Wei Fan;Yusuke Koyanagi;Koichi Asakura;Toyohide Watanabe
Affiliations:
Department of Systems and Social Informatics, Graduate School of Information Science, Nagoya University, Nagoya, Japan 464-8603;Department of Systems and Social Informatics, Graduate School of Information Science, Nagoya University, Nagoya, Japan 464-8603;School of Informatics, Daido Institute of Technology, Nagoya, Japan 457-8530;Department of Systems and Social Informatics, Graduate School of Information Science, Nagoya University, Nagoya, Japan 464-8603
Venue:
Knowledge Acquisition: Approaches, Algorithms and Applications
Year:
2009

Citing 9
Cited 1

Locally adaptive dimensionality reduction for indexing large time series databases

ACM Transactions on Database Systems (TODS)
On Similarity-Based Queries for Time Series Data

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Similarity Search Over Time-Series Data Using Wavelets

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Streaming-Data Algorithms for High-Quality Clustering

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
On demand classification of data streams

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining data streams: a review

ACM SIGMOD Record
StatStream: statistical monitoring of thousands of data streams in real time

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Efficient computation of frequent and top-k elements in data streams

ICDT'05 Proceedings of the 10th international conference on Database Theory

Content-based crowd retrieval on the real-time web

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

A growing number of real world applications deal with multiple evolving data streams. In this paper, a framework for clustering over evolving data streams is proposed taking advantage of recent-biased approximation. In recent-biased approximation, more details are preserved for recent data and fewer coefficients are kept for the whole data stream, which improves the efficiency of clustering and space usability greatly. Our framework consists of two phases. One is an online phase which approximates data streams and maintains the summary statistics incrementally. The other is an offline clustering phase which is able to perform dynamic clustering over data streams on all possible time horizons. As shown in complexity analyses and also validated by our empirical studies, our framework performed efficiently in the data stream environment while producing clustering results of very high quality.