Can exclusive clustering on streaming data be achieved?

Authors:
Maria E. Orlowska;Xingzhi Sun;Xue Li
Affiliations:
The University of Queensland, Brisbane, Australia;The University of Queensland, Brisbane, Australia;The University of Queensland, Brisbane, Australia
Venue:
ACM SIGKDD Explorations Newsletter
Year:
2006

Citing 9
Cited 2

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Models and issues in data stream systems

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Issues in data stream management

ACM SIGMOD Record
Better streaming algorithms for clustering problems

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Clustering data streams

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Clustering binary data streams with K-means

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Clustering For Data Mining: A Data Recovery Approach (Chapman & Hall/Crc Computer Science)

Clustering For Data Mining: A Data Recovery Approach (Chapman & Hall/Crc Computer Science)
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A framework for projected clustering of high dimensional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30

Rek-means: a k-means based clustering algorithm

ICVS'08 Proceedings of the 6th international conference on Computer vision systems
Exclusive and complete clustering of streams

DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering on streaming data aims at partitioning a list of data points into k groups of "similar" objects by scanning the data once. Most current one-scan clustering algorithms do not keep original data in the resulting clusters. The output of the algorithms is therefore not the clustered data points but the approximations of data properties according to the predefined similarity function, such that k centers and radiuses reflect the up-to-date data grouping. In this paper, we raise a critical question: can the partition-based clustering, or exclusive clustering, be achieved on streaming data by those currently available algorithms? After identifying the differences between traditional clustering and clustering on data streams, we discuss the basic requirements for the clusters that can be discovered from streaming data. We evaluate the recent work that is based on a subcluster maintenance approach. By using a few straightforward examples we illustrate that the subcluster maintenance approach may fail to resolve the exclusive clustering on data streams. Based on our observations, we also present the challenges on any heuristic method that claims solving the clustering problem on data streams in general.