C-DenStream: Using Domain Knowledge on a Data Stream

Authors:
Carlos Ruiz;Ernestina Menasalvas;Myra Spiliopoulou
Affiliations:
Facultad de Informática, Universidad Politécnica de Madrid, Spain;Facultad de Informática, Universidad Politécnica de Madrid, Spain;Faculty of Computer Science, Magdeburg University, Germany
Venue:
DS '09 Proceedings of the 12th International Conference on Discovery Science
Year:
2009

Citing 18
Cited 5

Mining high-speed data streams

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Requirements for clustering data streams

ACM SIGKDD Explorations Newsletter
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Semi-supervised Clustering by Seeding

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Clustering Data Streams: Theory and Practice

IEEE Transactions on Knowledge and Data Engineering
Clustering binary data streams with K-means

DMKD '03 Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
TECNO-STREAMS: Tracking Evolving Clusters in Noisy Data Streams with a Scalable Immune System Learning Model

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
A probabilistic framework for semi-supervised clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Integrating constraints and metric learning in semi-supervised clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Stream Data Management (The Kluwer International Series on Advances in Database Systems)

Stream Data Management (The Kluwer International Series on Advances in Database Systems)
A Framework for Semi-Supervised Learning Based on Subjective and Objective Clustering Criteria

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Data Streams: Models and Algorithms (Advances in Database Systems)

Data Streams: Models and Algorithms (Advances in Database Systems)
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
C-DBSCAN: Density-Based Clustering with Constraints

RSFDGrC '07 Proceedings of the 11th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Measuring constraint-set utility for partitional clustering algorithms

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
An incremental data stream clustering algorithm based on dense units detection

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Density-based semi-supervised clustering

Data Mining and Knowledge Discovery
Memory-less unsupervised clustering for data streaming by versatile ellipsoidal function

Proceedings of the 20th ACM international conference on Information and knowledge management
A semi-supervised incremental clustering algorithm for streaming data

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
A density-based clustering structure mining algorithm for data streams

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Online fuzzy medoid based clustering algorithms

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Stream clustering algorithms are traditionally designed to process streams efficiently and to adapt to the evolution of the underlying population. This is done without assuming any prior knowledge about the data. However, in many cases, a certain amount of domain or background knowledge is available, and instead of simply using it for the external validation of the clustering results, this knowledge can be used to guide the clustering process. In non-stream data, domain knowledge is exploited in the context of semi-supervised clustering . In this paper, we extend the static semi-supervised learning paradigm for streams. We present C-DenStream, a density-based clustering algorithm for data streams that includes domain information in the form of constraints. We also propose a novel method for the use of background knowledge in data streams. The performance study over a number of real and synthetic data sets demonstrates the effectiveness and efficiency of our method. To our knowledge, this is the first approach to include domain knowledge in clustering for data streams.