Grid-based subspace clustering over data streams

  • Authors:
  • Nam Hun Park;Won Suk Lee

  • Affiliations:
  • Yonsei University, Seoul, South Korea;Yonsei University, Seoul, South Korea

  • Venue:
  • Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

A real-life data stream usually contains many dimensions and some dimensional values of its data elements may be missing. In order to effectively extract the on-going change of a data stream with respect to all the subsets of the dimensions of the data stream, a grid-based subspace clustering algorithm is proposed in this paper. Given an n-dimensional data stream, the on-going distribution statistics of data elements in each one-dimension data space is firstly monitored by a list of grid-cells called a sibling list. Once a dense grid-cell of a first-level sibling list becomes a dense unit grid-cell, new second-level sibling lists are created as its child nodes in order to trace any cluster in all possible two-dimensional rectangular subspaces. In such a way, a sibling tree grows up to the nth level at most and a k-dimensional subcluster can be found in the kth level of the sibling tree. The proposed method is comparatively analyzed by a series of experiments to identify its various characteristics.