Sparse online topic models

Authors:
Aonan Zhang;Jun Zhu;Bo Zhang
Affiliations:
Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;Tsinghua University, Beijing, China
Venue:
Proceedings of the 22nd international conference on World Wide Web
Year:
2013

Citing 25
Cited 0

On-line learning and stochastic approximations

On-line learning in neural networks
Sparse code shrinkage: denoising of nongaussian data by maximum likelihood estimation

Neural Computation
On the algorithmic implementation of multiclass kernel-based vector machines

The Journal of Machine Learning Research
Latent dirichlet allocation

The Journal of Machine Learning Research
The author-topic model for authors and documents

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
A Bayesian Hierarchical Model for Learning Natural Scene Categories

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Discovering Objects and their Localization in Images

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1 - Volume 01
Learning Object Categories from Google"s Image Search

ICCV '05 Proceedings of the Tenth IEEE International Conference on Computer Vision - Volume 2
Dynamic topic models

ICML '06 Proceedings of the 23rd international conference on Machine learning
Using Multiple Segmentations to Discover Objects and their Extent in Image Collections

CVPR '06 Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Volume 2
Pegasos: Primal Estimated sub-GrAdient SOlver for SVM

Proceedings of the 24th international conference on Machine learning
Efficient projections onto the l1-ball for learning in high dimensions

Proceedings of the 25th international conference on Machine learning
Mixed Membership Stochastic Blockmodels

The Journal of Machine Learning Research
MedLDA: maximum margin supervised topic models for regression and classification

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Efficient methods for topic model inference on streaming document collections

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Exponential family sparse coding with applications to self-taught learning

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Polylingual topic models

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
On smoothing and inference for topic models

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Online multiscale dynamic topic models

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Regularized latent semantic indexing

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Mr. LDA: a flexible large scale topic modeling package using variational inference in MapReduce

Proceedings of the 21st international conference on World Wide Web
Towards compact topical descriptors

CVPR '12 Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Learning Semantic Motion Patterns for Dynamic Scenes by Improved Sparse Topical Coding

ICME '12 Proceedings of the 2012 IEEE International Conference on Multimedia and Expo
Multi-Level structured image coding on high-dimensional image representation

ACCV'12 Proceedings of the 11th Asian conference on Computer Vision - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Topic models have shown great promise in discovering latent semantic structures from complex data corpora, ranging from text documents and web news articles to images, videos, and even biological data. In order to deal with massive data collections and dynamic text streams, probabilistic online topic models such as online latent Dirichlet allocation (OLDA) have recently been developed. However, due to normalization constraints, OLDA can be ineffective in controlling the sparsity of discovered representations, a desirable property for learning interpretable semantic patterns, especially when the total number of topics is large. In contrast, sparse topical coding (STC) has been successfully introduced as a non-probabilistic topic model for effectively discovering sparse latent patterns by using sparsity-inducing regularization. But, unfortunately STC cannot scale to very large datasets or deal with online text streams, partly due to its batch learning procedure. In this paper, we present a sparse online topic model, which directly controls the sparsity of latent semantic patterns by imposing sparsity-inducing regularization and learns the topical dictionary by an online algorithm. The online algorithm is efficient and guaranteed to converge. Extensive empirical results of the sparse online topic model as well as its collapsed and supervised extensions on a large-scale Wikipedia dataset and the medium-sized 20Newsgroups dataset demonstrate appealing performance.