Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Streaming-Data Algorithms for High-Quality Clustering
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
A framework for clustering evolving data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A framework for projected clustering of high dimensional data streams
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Subspace Clustering of High Dimensional Data Streams
ICIS '08 Proceedings of the Seventh IEEE/ACIS International Conference on Computer and Information Science (icis 2008)
Hi-index | 0.00 |
Clustering data stream is a challenging work due to the limited memories and a single pass. In this paper, a new grid based algorithm for clustering high-dimensional data stream (called GHStream) is proposed, which adopts a two-phase clustering formwork. In the online component, a High-dimensional Dense Grid Tree (abbreviated HDG-Tree) is presented to summarize streaming data. As data streams evolve, the HDG-Tree is dynamic updated. In the offline component, when a clustering request is advanced by users, the grid cells stored in HDG-Tree is marked different clusterID to generate the final cluster results. The experimental results on real and synthetic datasets demonstrate that GHStream has higher clustering quality and better scalability.