Density-based clustering for real-time stream data

  • Authors:
  • Yixin Chen;Li Tu

  • Affiliations:
  • Washington University in St. Louis;Nanjing University of Aeronautics and Astronautics

  • Venue:
  • Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Existing data-stream clustering algorithms such as CluStream arebased on k-means. These clustering algorithms are incompetent tofind clusters of arbitrary shapes and cannot handle outliers. Further, they require the knowledge of k and user-specified time window. To address these issues, this paper proposes D-Stream, a framework for clustering stream data using adensity-based approach. The algorithm uses an online component which maps each input data record into a grid and an offline component which computes the grid density and clusters the grids based on the density. The algorithm adopts a density decaying technique to capture the dynamic changes of a data stream. Exploiting the intricate relationships between the decay factor, data density and cluster structure, our algorithm can efficiently and effectively generate and adjust the clusters in real time. Further, a theoretically sound technique is developed to detect and remove sporadic grids mapped to by outliers in order to dramatically improve the space and time efficiency of the system. The technique makes high-speed data stream clustering feasible without degrading the clustering quality. The experimental results show that our algorithm has superior quality and efficiency, can find clusters of arbitrary shapes, and can accurately recognize the evolving behaviors of real-time data streams.