Reexamining the cluster hypothesis: scatter/gather on retrieval results
SIGIR '96 Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval
Web document clustering: a feasibility demonstration
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
An Introduction to Variational Methods for Graphical Models
Machine Learning
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Assessing a Mixture Model for Clustering with the Integrated Completed Likelihood
IEEE Transactions on Pattern Analysis and Machine Intelligence
Learning to cluster web search results
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Graph-Theoretic Techniques for Web Content Mining
Graph-Theoretic Techniques for Web Content Mining
Linear prediction models with graph regularization for web-page categorization
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A model-based approach to attributed graph clustering
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
Hi-index | 0.10 |
Large datasets with interactions between objects are common to numerous scientific fields including the social sciences and biology, as well as being a feature of specific phenomena such as the internet. The interactions naturally define a graph, and a common way of exploring and summarizing such datasets is graph clustering. Most techniques for clustering graph vertices use only the topology of connections, while ignoring information about the vertices' features. In this paper we provide a clustering algorithm that harnesses both types of data, based on a statistical model with a latent structure characterizing each vertex both by a vector of features and by its connectivity. We perform simulations to compare our algorithm with existing approaches, and also evaluate our method using real datasets based on hypertext documents. We find that our algorithm successfully exploits whatever information is found both in the connectivity pattern and in the features.