BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Models and issues in data stream systems
Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Issues in data stream management
ACM SIGMOD Record
Better streaming algorithms for clustering problems
Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Clustering binary data streams with K-means
DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Clustering For Data Mining: A Data Recovery Approach (Chapman & Hall/Crc Computer Science)
Clustering For Data Mining: A Data Recovery Approach (Chapman & Hall/Crc Computer Science)
A framework for clustering evolving data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A framework for projected clustering of high dimensional data streams
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Rek-means: a k-means based clustering algorithm
ICVS'08 Proceedings of the 6th international conference on Computer vision systems
Exclusive and complete clustering of streams
DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Hi-index | 0.00 |
Clustering on streaming data aims at partitioning a list of data points into k groups of "similar" objects by scanning the data once. Most current one-scan clustering algorithms do not keep original data in the resulting clusters. The output of the algorithms is therefore not the clustered data points but the approximations of data properties according to the predefined similarity function, such that k centers and radiuses reflect the up-to-date data grouping. In this paper, we raise a critical question: can the partition-based clustering, or exclusive clustering, be achieved on streaming data by those currently available algorithms? After identifying the differences between traditional clustering and clustering on data streams, we discuss the basic requirements for the clusters that can be discovered from streaming data. We evaluate the recent work that is based on a subcluster maintenance approach. By using a few straightforward examples we illustrate that the subcluster maintenance approach may fail to resolve the exclusive clustering on data streams. Based on our observations, we also present the challenges on any heuristic method that claims solving the clustering problem on data streams in general.