Comparison of distance measures for graph-based clustering of documents

Authors:
Adam Schenker;Mark Last;Horst Bunke;Abraham Kandel
Affiliations:
University of South Florida, Department of Computer Science and Engineering, Tampa, FL;Ben-Gurion University of the Negev, Department of Information Systems Engineering, Beer-Sheva, Israel;University of Bern, Department of Computer Science, Bern, Switzerland;University of South Florida, Department of Computer Science and Engineering, Tampa, FL
Venue:
GbRPR'03 Proceedings of the 4th IAPR international conference on Graph based representations in pattern recognition
Year:
2003

Citing 11
Cited 4

Automatic text processing: the transformation, analysis, and retrieval of information by computer

Automatic text processing: the transformation, analysis, and retrieval of information by computer
Elements of information theory

Elements of information theory
On a relation between graph edit distance and maximum common subgraph

Pattern Recognition Letters
A New Algorithm for Error-Tolerant Subgraph Isomorphism Detection

IEEE Transactions on Pattern Analysis and Machine Intelligence
A graph distance metric based on the maximal common subgraph

Pattern Recognition Letters
Data clustering: a review

ACM Computing Surveys (CSUR)
Graph distances using graph union

Pattern Recognition Letters
A graph distance metric combining maximum common subgraph and minimum common supergraph

Pattern Recognition Letters
Machine Learning

Machine Learning
Self-organizing map for clustering in the graph domain

Pattern Recognition Letters - In memory of Professor E.S. Gelsema
Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters

IEEE Transactions on Computers

Case-Based Reasoning for Invoice Analysis and Recognition

ICCBR '07 Proceedings of the 7th international conference on Case-Based Reasoning: Case-Based Reasoning Research and Development
Visualizing graph dynamics and similarity for enterprise network security and management

Proceedings of the Seventh International Symposium on Visualization for Cyber Security
A novel approach for clustering sentiments in Chinese blogs based on graph similarity

Computers & Mathematics with Applications
A mixed graph model for community detection

International Journal of Intelligent Information and Database Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we describe work relating to clustering of document collections. We compare the conventional vector-model approach using cosine similarity and Euclidean distance to a novel method we have developed for clustering graph-based data with the standard k- means algorithm. The proposed method is evaluated using five different graph distance measures under three clustering performance indices. The experiments are performed on two separate document collections. The results show the graph-based approach performs as well as vector-based methods or even better when using normalized graph distance measures.