Can exclusive clustering on streaming data be achieved?

  • Authors:
  • Maria E. Orlowska;Xingzhi Sun;Xue Li

  • Affiliations:
  • The University of Queensland, Brisbane, Australia;The University of Queensland, Brisbane, Australia;The University of Queensland, Brisbane, Australia

  • Venue:
  • ACM SIGKDD Explorations Newsletter
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering on streaming data aims at partitioning a list of data points into k groups of "similar" objects by scanning the data once. Most current one-scan clustering algorithms do not keep original data in the resulting clusters. The output of the algorithms is therefore not the clustered data points but the approximations of data properties according to the predefined similarity function, such that k centers and radiuses reflect the up-to-date data grouping. In this paper, we raise a critical question: can the partition-based clustering, or exclusive clustering, be achieved on streaming data by those currently available algorithms? After identifying the differences between traditional clustering and clustering on data streams, we discuss the basic requirements for the clusters that can be discovered from streaming data. We evaluate the recent work that is based on a subcluster maintenance approach. By using a few straightforward examples we illustrate that the subcluster maintenance approach may fail to resolve the exclusive clustering on data streams. Based on our observations, we also present the challenges on any heuristic method that claims solving the clustering problem on data streams in general.