The Journal of Machine Learning Research
Graph Partition by Swendsen-Wang Cuts
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Learning a Classification Model for Segmentation
ICCV '03 Proceedings of the Ninth IEEE International Conference on Computer Vision - Volume 2
Discovering Objects and their Localization in Images
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
An Integrated Framework for Image Segmentation and Perceptual Grouping
ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Pattern Recognition and Machine Learning (Information Science and Statistics)
Pattern Recognition and Machine Learning (Information Science and Statistics)
Memory bounded inference in topic models
Proceedings of the 25th international conference on Machine learning
Fast collapsed gibbs sampling for latent dirichlet allocation
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Mixed Membership Stochastic Blockmodels
The Journal of Machine Learning Research
Efficient methods for topic model inference on streaming document collections
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-HDP: a non parametric Bayesian model for tensor factorization
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Leveraging multi-domain prior knowledge in topic models
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
Topic models have a wide range of applications, including modeling of text documents, images, user preferences, product rankings, and many others. However, learning optimal models may be difficult, especially for large problems. The reason is that inference techniques such as Gibbs sampling often converge to suboptimal models due to the abundance of local minima in large datasets. In this paper, we propose a general method of improving the performance of topic models. The method, called 'grouping transform', works by introducing auxiliary variables which represent assignments of the original model tokens to groups. Using these auxiliary variables, it becomes possible to resample an entire group of tokens at a time. This allows the sampler to make larger state space moves. As a result, better models are learned and performance is improved. The proposed ideas are illustrated on several topic models and several text and image datasets. We show that the grouping transform significantly improves performance over standard models.