Semi-Supervised Latent Dirichlet Allocation and Its Application for Document Classification

Authors:
Di Wang;Marcus Thint;Ahmad Al-Rubaie
Affiliations:
-;-;-
Venue:
WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Year:
2012

Citing 8
Cited 0

Latent dirichlet allocation

The Journal of Machine Learning Research
A Bayesian Hierarchical Model for Learning Natural Scene Categories

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Clustering the tagged web

Proceedings of the Second ACM International Conference on Web Search and Data Mining
MedLDA: maximum margin supervised topic models for regression and classification

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Latent Dirichlet Allocation with topic-in-set knowledge

SemiSupLearn '09 Proceedings of the NAACL HLT 2009 Workshop on Semi-Supervised Learning for Natural Language Processing
Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
The Intelligent Campus (iCampus): End-to-End Learning Lifecycle of a Knowledge Ecosystem

IE '10 Proceedings of the 2010 Sixth International Conference on Intelligent Environments
Partially labeled topic models for interpretable text mining

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Latent Dirichlet Allocation (LDA) is an unsupervised topic modeling method widely applied in natural language processing. However, standard LDA does not permit the use of supervised labels to incorporate expert knowledge into the learning procedure. This paper describes a semi-supervised LDA (ssLDA) method that supports multiple-topic labels per document, to incorporate available expert knowledge during the model construction. This improvement enables the alignment of resulting model with human expectations for topic modeling and extraction. We apply ssLDA to document classification problem on benchmark datasets. We investigate and compare how the size of training set and proportion of supervised data affect the final model structure and improve the prediction accuracy.