Comparing clustering algorithms and their influence on the evolution of labeled clusters

  • Authors:
  • Rene Schult

  • Affiliations:
  • Otto-von-Guericke-University Magdeburg, Institute of Technical and Business Information Systems

  • Venue:
  • DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We study the influence of different clustering algorithms on cluster evolution monitoring in data streams. The capturing and interpretation of cluster change delivers indicators on the evolution of the underlying population. For text stream monitoring, the clusters can be summarized into topics, so that cluster monitoring provides insights on the data and decline of thematic subjects over time. However, such insights should always be taken with a grain of salt: The quality of the clusters has a decisive impact on the observed changes. In the simplest case, cluster change across the stream may be due to the low quality of the original cluster than to a drift in the population belonging to this cluster.We show our framework ThemeFinder for topic evolution monitoring in streams and compare the influence to the quality of two very different cluster algorithms. After an evaluation of different cluster algorithms with external and internal quality measures, we use the center based bisecting k-means algorithm and the density-based DBScan algorithm. Our results show that the influence is relatively high and show that different clustering algorithms results allow to draw conclusion to the evaluation of the other cluster algorithm. Our experiments were done on a subarchive of the ACM library.