Efficiently tracing clusters over high-dimensional on-line data streams

Authors:
Jae Woo Lee;Nam Hun Park;Won Suk Lee
Affiliations:
Department of Computer Science, Yonsei University, 134 Shinchondong Seodaemungu, Seoul 120-749, Republic of Korea;Department of Computer Science, Yonsei University, 134 Shinchondong Seodaemungu, Seoul 120-749, Republic of Korea;Department of Computer Science, Yonsei University, 134 Shinchondong Seodaemungu, Seoul 120-749, Republic of Korea
Venue:
Data & Knowledge Engineering
Year:
2009

Citing 18
Cited 2

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient computation of Iceberg cubes with complex measures

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Sliding-window filtering: an efficient algorithm for incremental mining

Proceedings of the tenth international conference on Information and knowledge management
Maintaining stream statistics over sliding windows: (extended abstract)

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
STING: A Statistical Information Grid Approach to Spatial Data Mining

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Clustering Data Streams: Theory and Practice

IEEE Transactions on Knowledge and Data Engineering
Streaming-Data Algorithms for High-Quality Clustering

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Statistical grid-based clustering over data streams

ACM SIGMOD Record
Fast and Exact Out-of-Core K-Means Clustering

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Mining data streams: a review

ACM SIGMOD Record
Elementary Statistics Using Excel, Second Edition

Elementary Statistics Using Excel, Second Edition
Cell trees: An adaptive synopsis structure for clustering multi-dimensional on-line data streams

Data & Knowledge Engineering
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Clustering by random projections

ICDM'07 Proceedings of the 7th industrial conference on Advances in data mining: theoretical aspects and applications

A clustering algorithm based on matrix over high dimensional data stream

WISM'10 Proceedings of the 2010 international conference on Web information systems and mining
Document clustering using synthetic cluster prototypes

Data & Knowledge Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

A good clustering method should provide flexible scalability on the number of dimensions as well as the size of a data set. This paper proposes a method of efficiently tracing the clusters of a high-dimensional on-line data stream. While tracing the one-dimensional clusters of each dimension independently, a technique which is similar to frequent itemset mining is employed to find the set of multi-dimensional clusters. By finding a frequently co-occurred set of one-dimensional clusters, it is possible to trace a multi-dimensional rectangular space whose range is defined by the one-dimensional clusters collectively. In order to trace such candidates over a multi-dimensional online data stream, a cluster-statistics tree (CS-Tree) is proposed in this paper. A k-depth node(k=