Generalized projected clustering in high-dimensional data streams

Authors:
Ting Wang
Affiliations:
Computer Science Dept., University of British Columbia
Venue:
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
Year:
2006

Citing 12
Cited 0

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
On computing correlated aggregates over continual data streams

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
A Monte Carlo algorithm for fast projective clustering

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Efficient and Effective Clustering Methods for Spatial Data Mining

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A Human-Computer Interactive Method for Projected Clustering

IEEE Transactions on Knowledge and Data Engineering
Computing Clusters of Correlation Connected objects

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A framework for projected clustering of high dimensional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Quantified Score

Hi-index	0.00

Visualization

Abstract

Consider the problem of identifying dense subgroups of data points exhibiting strong correlations in data stream. Such correlation connected clusters are meaningful in many applications. However, the inherent sparsity of high-dimensional space means that the correlations are local for specific subspace, and moreover, the correlation itself can be of arbitrarily complex direction, which blinds most traditional methods. We present ACID, a framework that can effectively detect correlation connected clusters in high dimensional stream. It has high scalability on both the size of stream and the dimension of data, and is robust against noise. Experiments on synthetic and real datasets are done to show its effectiveness and efficiency.