A grid-based subspace clustering algorithm for high-dimensional data streams

Authors:
Yufen Sun;Yansheng Lu
Affiliations:
College of Computer Science & Technology, Huazhong University of Science & Technology, Wuhan, China;College of Computer Science & Technology, Huazhong University of Science & Technology, Wuhan, China
Venue:
WISE'06 Proceedings of the 7th international conference on Web Information Systems
Year:
2006

Citing 10
Cited 0

Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques

Data mining: concepts and techniques
WaveCluster: A Multi-Resolution Clustering Approach for Very Large Spatial Databases

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
Streaming-Data Algorithms for High-Quality Clustering

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Statistical grid-based clustering over data streams

ACM SIGMOD Record
Subspace Selection for Clustering High-Dimensional Data

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A framework for projected clustering of high dimensional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A grid-based clustering algorithm for high-dimensional data streams

ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many applications require the clustering of high-dimensional data streams. We propose a subspace clustering algorithm that can find clusters in different subspaces through one pass over a data stream. The algorithm combines the bottom-up grid-based method and top-down grid-based method. A uniformly partitioned grid data structure is used to summarize the data stream online. The top-down grid partition method is used o find the subspaces in which clusters locate. The errors made by the top-down partition procedure are eliminated by a mergence step in our algorithm. Our performance study with real datasets and synthetic dataset demonstrates the efficiency and effectiveness of our proposed algorithm.