Memory-less unsupervised clustering for data streaming by versatile ellipsoidal function

  • Authors:
  • Niwan Wattanakitrungroj;Chidchanok Lursinsap

  • Affiliations:
  • Advanced Virtual and Intelligent Computing (AVIC) Center, Bangkok, Thailand;Advanced Virtual and Intelligent Computing (AVIC) Center, Bangkok, Thailand

  • Venue:
  • Proceedings of the 20th ACM international conference on Information and knowledge management
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The challenge of clustering on data stream is the ability to deal with the continuous incoming data which are unlimited and unable to store all of them. To manage the storage crisis, the data must be processed in a single pass or only once after the arrival and are thrown away outer. All previously clustered data must be mathematically captured in terms of group features since those data are already non-existent. The proposed data stream clustering algorithm is divided into two main phases, namely on-line and off-line. In the on-line phase, new micro-cluster features are proposed. Our micro-cluster features better represent the arriving data than the traditional micro-cluster features. In the off-line phase, the prepared micro-clusters are categorized by their densities. The proposed method can generate the final clusters with different shapes and densities. Based on entropy, purity, Jaccard coefficient, and Rand statistic measures, our algorithm being applied on synthetic and real data outperforms the other previous data stream clustering algorithms.