Scalable text and link analysis with mixed-topic link models

Authors:
Yaojia Zhu;Xiaoran Yan;Lise Getoor;Cristopher Moore
Affiliations:
University of New Mexico, Albuquerque, NM, USA;University of New Mexico, Albuquerque, NM, USA;University of Maryland, College Park, MD, USA;Santa Fe Institute, Santa Fe, NM, USA
Venue:
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2013

Citing 11
Cited 0

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Learning to Probabilistically Identify Authoritative Documents

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Latent dirichlet allocation

The Journal of Machine Learning Research
Learning probabilistic models of link structure

The Journal of Machine Learning Research
Semi-supervised clustering: probabilistic models, algorithms and experiments

Semi-supervised clustering: probabilistic models, algorithms and experiments
Joint latent topic models for text and citations

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Mixed Membership Stochastic Blockmodels

The Journal of Machine Learning Research
HTM: a topic model for hypertexts

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A Bayesian framework for community detection integrating content and link

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Link Mining: Models, Algorithms, and Applications

Link Mining: Models, Algorithms, and Applications
Active learning for node classification in assortative and disassortative networks

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many data sets contain rich information about objects, as well as pairwise relations between them. For instance, in networks of websites, scientific papers, and other documents, each node has content consisting of a collection of words, as well as hyperlinks or citations to other nodes. In order to perform inference on such data sets, and make predictions and recommendations, it is useful to have models that are able to capture the processes which generate the text at each node and the links between them. In this paper, we combine classic ideas in topic modeling with a variant of the mixed-membership block model recently developed in the statistical physics community. The resulting model has the advantage that its parameters, including the mixture of topics of each document and the resulting overlapping communities, can be inferred with a simple and scalable expectation-maximization algorithm. We test our model on three data sets, performing unsupervised topic classification and link prediction. For both tasks, our model outperforms several existing state-of-the-art methods, achieving higher accuracy with significantly less computation, analyzing a data set with 1.3 million words and 44 thousand links in a few minutes.