Partially labeled topic models for interpretable text mining

Authors:
Daniel Ramage;Christopher D. Manning;Susan Dumais
Affiliations:
Stanford University, Stanford, CA, USA;Stanford University, Stanford, CA, USA;Microsoft Research, Redmond, WA, USA
Venue:
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2011

Citing 11
Cited 13

Latent dirichlet allocation

The Journal of Machine Learning Research
Pachinko allocation: DAG-structured mixture models of topic correlations

ICML '06 Proceedings of the 23rd international conference on Machine learning
Expertise modeling for matching papers with reviewers

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Online Learning of Multiple Tasks with a Shared Loss

The Journal of Machine Learning Research
Can social bookmarking improve web search?

WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Extracting shared subspace for multi-label classification

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
MedLDA: maximum margin supervised topic models for regression and classification

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Markov random topic fields

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
On smoothing and inference for topic models

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Semi-Supervised Learning

Semi-Supervised Learning

Practical collapsed variational bayes inference for hierarchical dirichlet process

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A Bayesian modeling approach to multi-dimensional sentiment distributions prediction

Proceedings of the First International Workshop on Issues of Sentiment Discovery and Opinion Mining
Group matrix factorization for scalable topic modeling

SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Modeling review comments

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
SSHLDA: a semi-supervised hierarchical topic model

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Hierarchical topic integration through semi-supervised hierarchical topic modeling

Proceedings of the 21st ACM international conference on Information and knowledge management
Feature LDA: a supervised topic model for automatic detection of web API documentations from the web

ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
Optimizing temporal topic segmentation for intelligent text visualization

Proceedings of the 2013 international conference on Intelligent user interfaces
Employing hierarchical Bayesian networks in simple and complex emotion topic analysis

Computer Speech and Language
Semi-Supervised Latent Dirichlet Allocation and Its Application for Document Classification

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Exploring weakly supervised latent sentiment explanations for aspect-level review analysis

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Domain-dependent/independent topic switching model for online reviews with numerical ratings

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Tag-weighted topic model for mining semi-structured documents

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.05

Visualization

Abstract

Abstract Much of the world's electronic text is annotated with human-interpretable labels, such as tags on web pages and subject codes on academic publications. Effective text mining in this setting requires models that can flexibly account for the textual patterns that underlie the observed labels while still discovering unlabeled topics. Neither supervised classification, with its focus on label prediction, nor purely unsupervised learning, which does not model the labels explicitly, is appropriate. In this paper, we present two new partially supervised generative models of labeled text, Partially Labeled Dirichlet Allocation (PLDA) and the Partially Labeled Dirichlet Process (PLDP). These models make use of the unsupervised learning machinery of topic models to discover the hidden topics within each label, as well as unlabeled, corpus-wide latent topics. We explore applications with qualitative case studies of tagged web pages from del.icio.us and PhD dissertation abstracts, demonstrating improved model interpretability over traditional topic models. We use the many tags present in our del.icio.us dataset to quantitatively demonstrate the new models' higher correlation with human relatedness scores over several strong baselines.