Improving performance of topic models by variable grouping

Authors:
Evgeniy Bart
Affiliations:
Palo Alto Research Center, Palo Alto, CA
Venue:
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Year:
2011

Citing 11
Cited 1

Latent dirichlet allocation

The Journal of Machine Learning Research
Graph Partition by Swendsen-Wang Cuts

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Learning a Classification Model for Segmentation

ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Discovering Objects and their Localization in Images

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
An Integrated Framework for Image Segmentation and Perceptual Grouping

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
Memory bounded inference in topic models

Proceedings of the 25th international conference on Machine learning
Fast collapsed gibbs sampling for latent dirichlet allocation

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Mixed Membership Stochastic Blockmodels

The Journal of Machine Learning Research
Efficient methods for topic model inference on streaming document collections

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-HDP: a non parametric Bayesian model for tensor factorization

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3

Leveraging multi-domain prior knowledge in topic models

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Topic models have a wide range of applications, including modeling of text documents, images, user preferences, product rankings, and many others. However, learning optimal models may be difficult, especially for large problems. The reason is that inference techniques such as Gibbs sampling often converge to suboptimal models due to the abundance of local minima in large datasets. In this paper, we propose a general method of improving the performance of topic models. The method, called 'grouping transform', works by introducing auxiliary variables which represent assignments of the original model tokens to groups. Using these auxiliary variables, it becomes possible to resample an entire group of tokens at a time. This allows the sampler to make larger state space moves. As a result, better models are learned and performance is improved. The proposed ideas are illustrated on several topic models and several text and image datasets. We show that the grouping transform significantly improves performance over standard models.