Discriminative Topic Modeling Based on Manifold Learning

Authors:
Seungil Huh;Stephen E. Fienberg
Affiliations:
Carnegie Mellon University;Carnegie Mellon University
Venue:
ACM Transactions on Knowledge Discovery from Data (TKDD)
Year:
2012

Citing 12
Cited 1

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Principal Direction Divisive Partitioning

Data Mining and Knowledge Discovery
Document clustering based on non-negative matrix factorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research
Graph Embedding and Extensions: A General Framework for Dimensionality Reduction

IEEE Transactions on Pattern Analysis and Machine Intelligence
Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples

The Journal of Machine Learning Research
Semisupervised learning from dissimilarity data

Computational Statistics & Data Analysis
Modeling hidden topics on document manifold

Proceedings of the 17th ACM conference on Information and knowledge management
Probabilistic dyadic data analysis with local and global consistency

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
MedLDA: maximum margin supervised topic models for regression and classification

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Discriminative topic modeling based on manifold learning

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

An empirical study on developer interactions in StackOverflow

Proceedings of the 28th Annual ACM Symposium on Applied Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Topic modeling has become a popular method used for data analysis in various domains including text documents. Previous topic model approaches, such as probabilistic Latent Semantic Analysis (pLSA) and Latent Dirichlet Allocation (LDA), have shown impressive success in discovering low-rank hidden structures for modeling text documents. These approaches, however do not take into account the manifold structure of the data, which is generally informative for nonlinear dimensionality reduction mapping. More recent topic model approaches, Laplacian PLSI (LapPLSI) and Locally-consistent Topic Model (LTM), have incorporated the local manifold structure into topic models and have shown resulting benefits. But they fall short of achieving full discriminating power of manifold learning as they only enhance the proximity between the low-rank representations of neighboring pairs without any consideration for non-neighboring pairs. In this article, we propose a new approach, Discriminative Topic Model (DTM), which separates non-neighboring pairs from each other in addition to bringing neighboring pairs closer together, thereby preserving the global manifold structure as well as improving local consistency. We also present a novel model-fitting algorithm based on the generalized EM algorithm and the concept of Pareto improvement. We empirically demonstrate the success of DTM in terms of unsupervised clustering and semisupervised classification accuracies on text corpora and robustness to parameters compared to state-of-the-art techniques.