Memory-less unsupervised clustering for data streaming by versatile ellipsoidal function

Authors:
Niwan Wattanakitrungroj;Chidchanok Lursinsap
Affiliations:
Advanced Virtual and Intelligent Computing (AVIC) Center, Bangkok, Thailand;Advanced Virtual and Intelligent Computing (AVIC) Center, Bangkok, Thailand
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 9
Cited 0

Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A framework for projected clustering of high dimensional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
A Grid and Density-Based Clustering Algorithm for Processing Data Stream

WGEC '08 Proceedings of the 2008 Second International Conference on Genetic and Evolutionary Computing
Incremental clustering of dynamic data streams using connectivity based representative points

Data & Knowledge Engineering
FlockStream: A Bio-Inspired Algorithm for Clustering Evolving Data Streams

ICTAI '09 Proceedings of the 2009 21st IEEE International Conference on Tools with Artificial Intelligence
C-DenStream: Using Domain Knowledge on a Data Stream

DS '09 Proceedings of the 12th International Conference on Discovery Science
The ClusTree: indexing micro-clusters for anytime stream mining

Knowledge and Information Systems
An incremental data stream clustering algorithm based on dense units detection

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

The challenge of clustering on data stream is the ability to deal with the continuous incoming data which are unlimited and unable to store all of them. To manage the storage crisis, the data must be processed in a single pass or only once after the arrival and are thrown away outer. All previously clustered data must be mathematically captured in terms of group features since those data are already non-existent. The proposed data stream clustering algorithm is divided into two main phases, namely on-line and off-line. In the on-line phase, new micro-cluster features are proposed. Our micro-cluster features better represent the arriving data than the traditional micro-cluster features. In the off-line phase, the prepared micro-clusters are categorized by their densities. The proposed method can generate the final clusters with different shapes and densities. Based on entropy, purity, Jaccard coefficient, and Rand statistic measures, our algorithm being applied on synthetic and real data outperforms the other previous data stream clustering algorithms.