Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Streaming-Data Algorithms for High-Quality Clustering
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Fast and Exact Out-of-Core K-Means Clustering
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
ACM SIGMOD Record
A Generic Framework for Efficient Subspace Clustering of High-Dimensional Data
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Cell trees: An adaptive synopsis structure for clustering multi-dimensional on-line data streams
Data & Knowledge Engineering
A framework for clustering evolving data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Grid-based subspace clustering over data streams
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Hi-index | 0.00 |
Subspace clustering over an online multi-dimensional data stream requires to examine all the subsets of its dimensions, so that a huge amount of memory space may be required. To trace the ongoing changes of cluster patterns over an online data stream by a confined memory space, this paper proposes a grid-based subspace clustering algorithm that can utilize the confined memory space effectively. Given an n-dimensional data stream, the on-going distribution statistics of data elements in each one-dimension data space are firstly monitored by a list of grid-cells called a sibling list. Once a grid-cell of a first-level sibling list becomes a dense unit grid-cell, new second-level sibling lists are created as its child nodes in order to trace any cluster in all possible two-dimensional rectangular subspaces. In such a way, a sibling tree grows up to the nth level at most and a k-dimensional subcluster can be found at the kth level of the sibling tree. To utilize the confined space of main memory effectively, only the upper-part of a sibling tree is expanded at all times and the subtrees in the lower part are expanded in turns by various scheduling policies such as round-robin and priority-based. Furthermore, in order to confine the usage of memory space, the size of a unit grid-cell is adaptively minimized such that the result of clustering becomes as accurate as possible at all times. The performance of the proposed method is comparatively analyzed by a number of experiments to identify its various characteristics.