A grid-based subspace clustering algorithm for high-dimensional data streams

  • Authors:
  • Yufen Sun;Yansheng Lu

  • Affiliations:
  • College of Computer Science & Technology, Huazhong University of Science & Technology, Wuhan, China;College of Computer Science & Technology, Huazhong University of Science & Technology, Wuhan, China

  • Venue:
  • WISE'06 Proceedings of the 7th international conference on Web Information Systems
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Many applications require the clustering of high-dimensional data streams. We propose a subspace clustering algorithm that can find clusters in different subspaces through one pass over a data stream. The algorithm combines the bottom-up grid-based method and top-down grid-based method. A uniformly partitioned grid data structure is used to summarize the data stream online. The top-down grid partition method is used o find the subspaces in which clusters locate. The errors made by the top-down partition procedure are eliminated by a mergence step in our algorithm. Our performance study with real datasets and synthetic dataset demonstrates the efficiency and effectiveness of our proposed algorithm.