Clustering based on random graph model embedding vertex features

Authors:
Hugo Zanghi;Stevenn Volant;Christophe Ambroise
Affiliations:
Exalead, 10 place de la Madeleine, 75008 Paris, France;Agroparistech (UMR 518), 16 rue Claude Bernard, 75231 Paris, France;Statistique et Génome (UMR CNRS 8071, INRA 1152), La genopole Tour Evry 2, 523 place des Terrasses, 91000 Evry, France
Venue:
Pattern Recognition Letters
Year:
2010

Citing 8
Cited 1

Reexamining the cluster hypothesis: scatter/gather on retrieval results

SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Web document clustering: a feasibility demonstration

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
An Introduction to Variational Methods for Graphical Models

Machine Learning
Topical locality in the Web

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood

IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning to cluster web search results

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Graph-Theoretic Techniques for Web Content Mining

Graph-Theoretic Techniques for Web Content Mining
Linear prediction models with graph regularization for web-page categorization

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

A model-based approach to attributed graph clustering

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data

Quantified Score

Hi-index	0.10

Visualization

Abstract

Large datasets with interactions between objects are common to numerous scientific fields including the social sciences and biology, as well as being a feature of specific phenomena such as the internet. The interactions naturally define a graph, and a common way of exploring and summarizing such datasets is graph clustering. Most techniques for clustering graph vertices use only the topology of connections, while ignoring information about the vertices' features. In this paper we provide a clustering algorithm that harnesses both types of data, based on a statistical model with a latent structure characterizing each vertex both by a vector of features and by its connectivity. We perform simulations to compare our algorithm with existing approaches, and also evaluate our method using real datasets based on hypertext documents. We find that our algorithm successfully exploits whatever information is found both in the connectivity pattern and in the features.