Group sparse topical coding: from code to topic

Authors:
Lu Bai;Jiafeng Guo;Yanyan Lan;Xueqi Cheng
Affiliations:
Institute Of Computing Technology, Chinese Academy Of Sciences, Beijing, China;Institute Of Computing Technology, Chinese Academy Of Sciences, Beijing, China;Institute Of Computing Technology, Chinese Academy Of Sciences, Beijing, China;Institute Of Computing Technology, Chinese Academy Of Sciences, Beijing, China
Venue:
Proceedings of the sixth ACM international conference on Web search and data mining
Year:
2013

Citing 14
Cited 0

Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
Latent dirichlet allocation

The Journal of Machine Learning Research
LDA-based document models for ad-hoc retrieval

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Learning Sparse Representations by Non-Negative Matrix Factorization and Sequential Cone Programming

The Journal of Machine Learning Research
SRDA: An Efficient Algorithm for Large-Scale Discriminant Analysis

IEEE Transactions on Knowledge and Data Engineering
Efficient projections onto the l1-ball for learning in high dimensions

Proceedings of the 25th international conference on Machine learning
Mixed Membership Stochastic Blockmodels

The Journal of Machine Learning Research
Modeling hidden topics on document manifold

Proceedings of the 17th ACM conference on Information and knowledge management
Group lasso with overlap and graph lasso

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
The Bayesian group-Lasso for analyzing contingency tables

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Exponential family sparse coding with applications to self-taught learning

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Regularized latent semantic indexing

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Collaborative topic modeling for recommending scientific articles

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Discrete component analysis

SLSFS'05 Proceedings of the 2005 international conference on Subspace, Latent Structure and Feature Selection

Quantified Score

Hi-index	0.00

Visualization

Abstract

Learning low dimensional representations of text corpora is critical in many content analysis and data mining applications. It is even more desired and challenging to learn a sparse representation in practice for large scale text modeling. However, traditional probabilistic topic models (PTM) lack a mechanism to directly control the posterior sparsity of the inferred representations; While the emerged non-probabilistic models (NPM) can explicitly control sparsity using sparse constraint like l_1 norm, they convey different limitations in latent representations. To address the existing problems, we propose a novel non-probabilistic topic model for discovering sparse latent representations of large text corpora, referred as group sparse topical coding (GSTC). Our model enjoys both the merits of the PTMs and NPMs. On one hand, GSTC can naturally derive document-level admixture proportions in topic simplex like PTMs, which is useful for semantic analysis, classification or retrieval. On the other hand, GSTC can directly control the sparsity of the inferred representations with group lasso by relaxing the normalization constraint. Moreover, the relaxed non-probabilistic GSTC can be effectively learned using coordinate descent method. Experimental results on benchmark datasets show that GSTC can discover meaningful compact latent representations of documents, and improve the document classification accuracy and time efficiency.