The Journal of Machine Learning Research
Pachinko allocation: DAG-structured mixture models of topic correlations
ICML '06 Proceedings of the 23rd international conference on Machine learning
Expertise modeling for matching papers with reviewers
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Online Learning of Multiple Tasks with a Shared Loss
The Journal of Machine Learning Research
Can social bookmarking improve web search?
WSDM '08 Proceedings of the 2008 International Conference on Web Search and Data Mining
Extracting shared subspace for multi-label classification
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
MedLDA: maximum margin supervised topic models for regression and classification
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
On smoothing and inference for topic models
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Semi-Supervised Learning
Practical collapsed variational bayes inference for hierarchical dirichlet process
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A Bayesian modeling approach to multi-dimensional sentiment distributions prediction
Proceedings of the First International Workshop on Issues of Sentiment Discovery and Opinion Mining
Group matrix factorization for scalable topic modeling
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
SSHLDA: a semi-supervised hierarchical topic model
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Hierarchical topic integration through semi-supervised hierarchical topic modeling
Proceedings of the 21st ACM international conference on Information and knowledge management
Feature LDA: a supervised topic model for automatic detection of web API documentations from the web
ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
Optimizing temporal topic segmentation for intelligent text visualization
Proceedings of the 2013 international conference on Intelligent user interfaces
Employing hierarchical Bayesian networks in simple and complex emotion topic analysis
Computer Speech and Language
Semi-Supervised Latent Dirichlet Allocation and Its Application for Document Classification
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Exploring weakly supervised latent sentiment explanations for aspect-level review analysis
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Domain-dependent/independent topic switching model for online reviews with numerical ratings
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Tag-weighted topic model for mining semi-structured documents
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.05 |
Abstract Much of the world's electronic text is annotated with human-interpretable labels, such as tags on web pages and subject codes on academic publications. Effective text mining in this setting requires models that can flexibly account for the textual patterns that underlie the observed labels while still discovering unlabeled topics. Neither supervised classification, with its focus on label prediction, nor purely unsupervised learning, which does not model the labels explicitly, is appropriate. In this paper, we present two new partially supervised generative models of labeled text, Partially Labeled Dirichlet Allocation (PLDA) and the Partially Labeled Dirichlet Process (PLDP). These models make use of the unsupervised learning machinery of topic models to discover the hidden topics within each label, as well as unlabeled, corpus-wide latent topics. We explore applications with qualitative case studies of tagged web pages from del.icio.us and PhD dissertation abstracts, demonstrating improved model interpretability over traditional topic models. We use the many tags present in our del.icio.us dataset to quantitatively demonstrate the new models' higher correlation with human relatedness scores over several strong baselines.