C-DenStream: Using Domain Knowledge on a Data Stream

  • Authors:
  • Carlos Ruiz;Ernestina Menasalvas;Myra Spiliopoulou

  • Affiliations:
  • Facultad de Informática, Universidad Politécnica de Madrid, Spain;Facultad de Informática, Universidad Politécnica de Madrid, Spain;Faculty of Computer Science, Magdeburg University, Germany

  • Venue:
  • DS '09 Proceedings of the 12th International Conference on Discovery Science
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Stream clustering algorithms are traditionally designed to process streams efficiently and to adapt to the evolution of the underlying population. This is done without assuming any prior knowledge about the data. However, in many cases, a certain amount of domain or background knowledge is available, and instead of simply using it for the external validation of the clustering results, this knowledge can be used to guide the clustering process. In non-stream data, domain knowledge is exploited in the context of semi-supervised clustering . In this paper, we extend the static semi-supervised learning paradigm for streams. We present C-DenStream, a density-based clustering algorithm for data streams that includes domain information in the form of constraints. We also propose a novel method for the use of background knowledge in data streams. The performance study over a number of real and synthetic data sets demonstrates the effectiveness and efficiency of our method. To our knowledge, this is the first approach to include domain knowledge in clustering for data streams.