SSHLDA: a semi-supervised hierarchical topic model

Authors:
Xian-Ling Mao;Zhao-Yan Ming;Tat-Seng Chua;Si Li;Hongfei Yan;Xiaoming Li
Affiliations:
Peking University, China;National University of Singapore, Singapore;National University of Singapore, Singapore;Beijing University of Posts and Telecommunications, China;Peking University, China;Peking University, China
Venue:
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Year:
2012

Citing 14
Cited 2

Latent dirichlet allocation

The Journal of Machine Learning Research
The author-topic model for authors and documents

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Pachinko allocation: DAG-structured mixture models of topic correlations

ICML '06 Proceedings of the 23rd international conference on Machine learning
Mixtures of hierarchical topics with Pachinko allocation

Proceedings of the 24th international conference on Machine learning
Introduction to Information Retrieval

Introduction to Information Retrieval
Combining concept hierarchies and statistical topic models

Proceedings of the 17th ACM conference on Information and knowledge management
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning

ISWC '08 Proceedings of the 7th International Conference on The Semantic Web
Clustering the tagged web

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Prototype hierarchy based clustering for the categorization and navigation of web collections

Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
A hierarchical model of web summaries

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Partially labeled topic models for interpretable text mining

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Statistical topic models for multi-label document classification

Machine Learning

Hierarchical topic integration through semi-supervised hierarchical topic modeling

Proceedings of the 21st ACM international conference on Information and knowledge management
A hierarchical Dirichlet model for taxonomy expansion for search engines

Proceedings of the 23rd international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Supervised hierarchical topic modeling and unsupervised hierarchical topic modeling are usually used to obtain hierarchical topics, such as hLLDA and hLDA. Supervised hierarchical topic modeling makes heavy use of the information from observed hierarchical labels, but cannot explore new topics; while unsupervised hierarchical topic modeling is able to detect automatically new topics in the data space, but does not make use of any information from hierarchical labels. In this paper, we propose a semi-supervised hierarchical topic model which aims to explore new topics automatically in the data space while incorporating the information from observed hierarchical labels into the modeling process, called Semi-Supervised Hierarchical Latent Dirichlet Allocation (SSHLDA). We also prove that hLDA and hLLDA are special cases of SSHLDA. We conduct experiments on Yahoo! Answers and ODP datasets, and assess the performance in terms of perplexity and clustering. The experimental results show that predictive ability of SSHLDA is better than that of baselines, and SSHLDA can also achieve significant improvement over baselines for clustering on the FScore measure.