K-means based approaches to clustering nodes in annotated graphs

Authors:
Tijn Witsenburg;Hendrik Blockeel
Affiliations:
Leiden Institute of Advanced Computer Science, Universiteit Leiden, Leiden, The Netherlands;Leiden Institute of Advanced Computer Science, Universiteit Leiden, Leiden, The Netherlands and Department of Computer Science, Katholieke Universiteit Leuven, Leuven, Belgium
Venue:
ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems
Year:
2011

Citing 6
Cited 1

Clustering Algorithms

Clustering Algorithms
Automating the Construction of Internet Portals with Machine Learning

Information Retrieval
Extending K-Means Clustering to First-Order Representations

ILP '00 Proceedings of the 10th International Conference on Inductive Logic Programming
Introduction to Information Retrieval

Introduction to Information Retrieval
Substructure discovery using minimum description length and background knowledge

Journal of Artificial Intelligence Research
Graph clustering based on structural/attribute similarities

Proceedings of the VLDB Endowment

Improving Vietnamese web page clustering by combining neighbors' content and using iterative feature selection

Proceedings of the Third Symposium on Information and Communication Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

The goal of clustering is to form groups of similar elements. Quality criteria for clusterings, as well as the notion of similarity, depend strongly on the application domain, which explains the existence of many different clustering algorithms and similarity measures. In this paper we focus on the problem of clustering annotated nodes in a graph, when the similarity between nodes depends on both their annotations and their context in the graph ("hybrid" similarity), using k-means-like clustering algorithms. We show that, for the similarity measure we focus on, k-means itself cannot trivially be applied. We propose three alternatives, and evaluate them empirically on the Cora dataset. We find that using these alternative clustering algorithms with the hybrid similarity can be advantageous over using standard k-means with a purely annotation-based similarity.