A neighborhood-based approach for clustering of linked document collections

  • Authors:
  • Ralitsa Angelova;Stefan Siersdorfer

  • Affiliations:
  • Max Planck Institute for Informatics, Saarbrücken, Germany;Max Planck Institute for Informatics, Saarbrücken, Germany

  • Venue:
  • CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper addresses the problem of automatically structuring linked document collections by using clustering. In contrast to traditional clustering, we study the clustering problem in the light of available link structure information for the data set (e.g., hyperlinks among web documents or co-authorship among bibliographic data entries). Our approach is based on iterative relaxation of cluster assignments, and can be built on top of any clustering algorithm. This technique results in higher cluster purity, better overall accuracy, and make self-organization more robust.