Clustering over Evolving Data Streams Based on Online Recent-Biased Approximation

  • Authors:
  • Wei Fan;Yusuke Koyanagi;Koichi Asakura;Toyohide Watanabe

  • Affiliations:
  • Department of Systems and Social Informatics, Graduate School of Information Science, Nagoya University, Nagoya, Japan 464-8603;Department of Systems and Social Informatics, Graduate School of Information Science, Nagoya University, Nagoya, Japan 464-8603;School of Informatics, Daido Institute of Technology, Nagoya, Japan 457-8530;Department of Systems and Social Informatics, Graduate School of Information Science, Nagoya University, Nagoya, Japan 464-8603

  • Venue:
  • Knowledge Acquisition: Approaches, Algorithms and Applications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

A growing number of real world applications deal with multiple evolving data streams. In this paper, a framework for clustering over evolving data streams is proposed taking advantage of recent-biased approximation. In recent-biased approximation, more details are preserved for recent data and fewer coefficients are kept for the whole data stream, which improves the efficiency of clustering and space usability greatly. Our framework consists of two phases. One is an online phase which approximates data streams and maintains the summary statistics incrementally. The other is an offline clustering phase which is able to perform dynamic clustering over data streams on all possible time horizons. As shown in complexity analyses and also validated by our empirical studies, our framework performed efficiently in the data stream environment while producing clustering results of very high quality.