An efficient block model for clustering sparse graphs

Authors:
Ádám Gyenge;Janne Sinkkonen;András A. Benczúr
Affiliations:
Computer and Automation Research Institute, Hungarian Academy of Sciences;Xtract Ltd., Helsinki, Finland;Computer and Automation Research Institute, Hungarian Academy of Sciences
Venue:
Proceedings of the Eighth Workshop on Mining and Learning with Graphs
Year:
2010

Citing 8
Cited 0

Latent dirichlet allocation

The Journal of Machine Learning Research
Comparing clusterings: an axiomatic view

ICML '05 Proceedings of the 22nd international conference on Machine learning
A latent mixed membership model for relational data

Proceedings of the 3rd international workshop on Link discovery
A spectral clustering approach to optimally combining numericalvectors with a modular network

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Joint latent topic models for text and citations

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Fast collapsed gibbs sampling for latent dirichlet allocation

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Latent dirichlet allocation in web spam filtering

AIRWeb '08 Proceedings of the 4th international workshop on Adversarial information retrieval on the web
On smoothing and inference for topic models

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Models for large, sparse graphs are found in many applications and are an active topic in machine learning research. We develop a new generative model that combines rich block structure and simple, efficient estimation by collapsed Gibbs sampling. Novel in our method is that we may learn the strength of assortative and disassortative mixing schemes of communities. Most earlier approaches, both based on low-dimensional projections and Latent Dirichlet Allocation implicitely rely on one of the two assumptions: some algorithms define similarity based solely on connectedness while others solely on the similarity of the neighborhood, leading to undesired results for example in near-bipartite subgraphs. In our experiments we cluster both small and large graphs, involving real and generated graphs that are known to be hard to partition. Our method outperforms earlier Latent Dirichlet Allocation based models as well as spectral heuristics.