Grid-based subspace clustering over data streams

Authors:
Nam Hun Park;Won Suk Lee
Affiliations:
Yonsei University, Seoul, South Korea;Yonsei University, Seoul, South Korea
Venue:
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Year:
2007

Citing 12
Cited 6

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Maintaining stream statistics over sliding windows: (extended abstract)

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Streaming-Data Algorithms for High-Quality Clustering

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Statistical grid-based clustering over data streams

ACM SIGMOD Record
Mining data streams: a review

ACM SIGMOD Record
A Generic Framework for Efficient Subspace Clustering of High-Dimensional Data

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Cell trees: An adaptive synopsis structure for clustering multi-dimensional on-line data streams

Data & Knowledge Engineering
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29

Memory efficient subspace clustering for online data streams

IDEAS '08 Proceedings of the 2008 international symposium on Database engineering & applications
Clustering data stream: A survey of algorithms

International Journal of Knowledge-based and Intelligent Engineering Systems
Efficient mining of skyline objects in subspaces over data streams

Knowledge and Information Systems
MG-join: detecting phenomena and their correlation in high dimensional data streams

Distributed and Parallel Databases
Concurrent semi-supervised learning of data streams

DaWaK'11 Proceedings of the 13th international conference on Data warehousing and knowledge discovery
Density-Based projected clustering of data streams

SUM'12 Proceedings of the 6th international conference on Scalable Uncertainty Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

A real-life data stream usually contains many dimensions and some dimensional values of its data elements may be missing. In order to effectively extract the on-going change of a data stream with respect to all the subsets of the dimensions of the data stream, a grid-based subspace clustering algorithm is proposed in this paper. Given an n-dimensional data stream, the on-going distribution statistics of data elements in each one-dimension data space is firstly monitored by a list of grid-cells called a sibling list. Once a dense grid-cell of a first-level sibling list becomes a dense unit grid-cell, new second-level sibling lists are created as its child nodes in order to trace any cluster in all possible two-dimensional rectangular subspaces. In such a way, a sibling tree grows up to the nth level at most and a k-dimensional subcluster can be found in the kth level of the sibling tree. The proposed method is comparatively analyzed by a series of experiments to identify its various characteristics.