Comparison of distance measures for graph-based clustering of documents

  • Authors:
  • Adam Schenker;Mark Last;Horst Bunke;Abraham Kandel

  • Affiliations:
  • University of South Florida, Department of Computer Science and Engineering, Tampa, FL;Ben-Gurion University of the Negev, Department of Information Systems Engineering, Beer-Sheva, Israel;University of Bern, Department of Computer Science, Bern, Switzerland;University of South Florida, Department of Computer Science and Engineering, Tampa, FL

  • Venue:
  • GbRPR'03 Proceedings of the 4th IAPR international conference on Graph based representations in pattern recognition
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we describe work relating to clustering of document collections. We compare the conventional vector-model approach using cosine similarity and Euclidean distance to a novel method we have developed for clustering graph-based data with the standard k- means algorithm. The proposed method is evaluated using five different graph distance measures under three clustering performance indices. The experiments are performed on two separate document collections. The results show the graph-based approach performs as well as vector-based methods or even better when using normalized graph distance measures.