A semi-supervised incremental clustering algorithm for streaming data

Authors:
Maria Halkidi;Myra Spiliopoulou;Aikaterini Pavlou
Affiliations:
Dept of Digital Systems, University of Piraeus, Greece;Faculty of Computer Science, Magdeburg University, Germany;Athens University of Economics and Business, Greece
Venue:
PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Year:
2012

Citing 11
Cited 0

Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Semi-supervised Clustering by Seeding

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A probabilistic framework for semi-supervised clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Integrating constraints and metric learning in semi-supervised clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
A Framework for Semi-Supervised Learning Based on Subjective and Objective Clustering Criteria

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
A framework for projected clustering of high dimensional data streams

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Data Streaming with Affinity Propagation

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
C-DenStream: Using Domain Knowledge on a Data Stream

DS '09 Proceedings of the 12th International Conference on Discovery Science
Measuring constraint-set utility for partitional clustering algorithms

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Agglomerative hierarchical clustering with constraints: theoretical and empirical results

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Nowadays many applications need to deal with evolving data streams . In this work, we propose an incremental clustering approach for the exploitation of user constraints on data streams. Conventional constraints do not make sense on streaming data, so we extend the classic notion of constraint set into a constraint stream . We propose methods for using the constraint stream as data items are forgotten or new items arrive. Also we present an on-line clustering approach for the cost-based enforcement of the constraints during cluster adaptation on evolving data streams. Our method introduces the concept of multi-clusters (m-clusters) to capture arbitrarily shaped clusters. An m-cluster consists of multiple dense overlapping regions, named s-clusters, each of which can be efficiently represented by a single point. Also it proposes the definition of outliers clusters in order to handle outliers while it provides methods to observe changes in structure of clusters as data evolves.