HDG-tree: a structure for clustering high-dimensional data streams

  • Authors:
  • Jiadong Ren;Lining Li;Yan Xia;Jiadong Ren

  • Affiliations:
  • College of Information Science and Engineering, Yanshan University, Qinhuangdao, China;College of Information Science and Engineering, Yanshan University, Qinhuangdao, China;College of Information Science and Engineering, Yanshan University, Qinhuangdao, China;School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China

  • Venue:
  • IITA'09 Proceedings of the 3rd international conference on Intelligent information technology application
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering data stream is a challenging work due to the limited memories and a single pass. In this paper, a new grid based algorithm for clustering high-dimensional data stream (called GHStream) is proposed, which adopts a two-phase clustering formwork. In the online component, a High-dimensional Dense Grid Tree (abbreviated HDG-Tree) is presented to summarize streaming data. As data streams evolve, the HDG-Tree is dynamic updated. In the offline component, when a clustering request is advanced by users, the grid cells stored in HDG-Tree is marked different clusterID to generate the final cluster results. The experimental results on real and synthetic datasets demonstrate that GHStream has higher clustering quality and better scalability.