Unsupervised learning by probabilistic latent semantic analysis
Machine Learning
The Journal of Machine Learning Research
LDA-based document models for ad-hoc retrieval
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Learning Sparse Representations by Non-Negative Matrix Factorization and Sequential Cone Programming
The Journal of Machine Learning Research
SRDA: An Efficient Algorithm for Large-Scale Discriminant Analysis
IEEE Transactions on Knowledge and Data Engineering
Efficient projections onto the l1-ball for learning in high dimensions
Proceedings of the 25th international conference on Machine learning
Mixed Membership Stochastic Blockmodels
The Journal of Machine Learning Research
Modeling hidden topics on document manifold
Proceedings of the 17th ACM conference on Information and knowledge management
Group lasso with overlap and graph lasso
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
The Bayesian group-Lasso for analyzing contingency tables
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Exponential family sparse coding with applications to self-taught learning
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Regularized latent semantic indexing
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Collaborative topic modeling for recommending scientific articles
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
SLSFS'05 Proceedings of the 2005 international conference on Subspace, Latent Structure and Feature Selection
Hi-index | 0.00 |
Learning low dimensional representations of text corpora is critical in many content analysis and data mining applications. It is even more desired and challenging to learn a sparse representation in practice for large scale text modeling. However, traditional probabilistic topic models (PTM) lack a mechanism to directly control the posterior sparsity of the inferred representations; While the emerged non-probabilistic models (NPM) can explicitly control sparsity using sparse constraint like l_1 norm, they convey different limitations in latent representations. To address the existing problems, we propose a novel non-probabilistic topic model for discovering sparse latent representations of large text corpora, referred as group sparse topical coding (GSTC). Our model enjoys both the merits of the PTMs and NPMs. On one hand, GSTC can naturally derive document-level admixture proportions in topic simplex like PTMs, which is useful for semantic analysis, classification or retrieval. On the other hand, GSTC can directly control the sparsity of the inferred representations with group lasso by relaxing the normalization constraint. Moreover, the relaxed non-probabilistic GSTC can be effectively learned using coordinate descent method. Experimental results on benchmark datasets show that GSTC can discover meaningful compact latent representations of documents, and improve the document classification accuracy and time efficiency.