Discriminative topic modeling based on manifold learning

Authors:
Seungil Huh;Stephen E. Fienberg
Affiliations:
Carnegie Mellon University, Pittsburgh, PA, USA;Carnegie Mellon University, Pittsburgh, PA, USA
Venue:
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2010

Citing 7
Cited 5

Term-weighting approaches in automatic text retrieval

Information Processing and Management: an International Journal
Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Principal Direction Divisive Partitioning

Data Mining and Knowledge Discovery
Latent dirichlet allocation

The Journal of Machine Learning Research
Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples

The Journal of Machine Learning Research
Modeling hidden topics on document manifold

Proceedings of the 17th ACM conference on Information and knowledge management
Probabilistic dyadic data analysis with local and global consistency

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning

Probabilistic topic models with biased propagation on heterogeneous information networks

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Discriminative Topic Modeling Based on Manifold Learning

ACM Transactions on Knowledge Discovery from Data (TKDD)
A hybrid semi-supervised topic model

IScIDE'11 Proceedings of the Second Sino-foreign-interchange conference on Intelligent Science and Intelligent Data Engineering
Transforming graph data for statistical relational learning

Journal of Artificial Intelligence Research
A jointly distributed semi-supervised topic model

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Topic modeling has been popularly used for data analysis in various domains including text documents. Previous topic models, such as probabilistic Latent Semantic Analysis (pLSA) and Latent Dirichlet Allocation (LDA), have shown impressive success in discovering low-rank hidden structures for modeling text documents. These models, however, do not take into account the manifold structure of data, which is generally informative for the non-linear dimensionality reduction mapping. More recent models, namely Laplacian PLSI (LapPLSI) and Locally-consistent Topic Model (LTM), have incorporated the local manifold structure into topic models and have shown the resulting benefits. But these approaches fall short of the full discriminating power of manifold learning as they only enhance the proximity between the low-rank representations of neighboring pairs without any consideration for non-neighboring pairs. In this paper, we propose Discriminative Topic Model (DTM) that separates non-neighboring pairs from each other in addition to bringing neighboring pairs closer together, thereby preserving the global manifold structure as well as improving the local consistency. We also present a novel model fitting algorithm based on the generalized EM and the concept of Pareto improvement. As a result, DTM achieves higher classification performance in a semi-supervised setting by effectively exposing the manifold structure of data. We provide empirical evidence on text corpora to demonstrate the success of DTM in terms of classification accuracy and robustness to parameters compared to state-of-the-art techniques.