Modeling topic hierarchies with the recursive chinese restaurant process

Authors:
Joon Hee Kim;Dongwoo Kim;Suin Kim;Alice Oh
Affiliations:
Korea Advanced Institute of Science and Technology, Daejeon, South Korea;Korea Advanced Institute of Science and Technology, Daejeon, South Korea;Korea Advanced Institute of Science and Technology, Daejeon, South Korea;Korea Advanced Institute of Science and Technology, Daejeon, South Korea
Venue:
Proceedings of the 21st ACM international conference on Information and knowledge management
Year:
2012

Citing 10
Cited 1

Latent dirichlet allocation

The Journal of Machine Learning Research
Bayesian hierarchical clustering

ICML '05 Proceedings of the 22nd international conference on Machine learning
Pachinko allocation: DAG-structured mixture models of topic correlations

ICML '06 Proceedings of the 23rd international conference on Machine learning
Mixtures of hierarchical topics with Pachinko allocation

Proceedings of the 24th international conference on Machine learning
Evaluation methods for topic models

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Latent interest-topic model: finding the causal relationships behind dyadic data

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Multi-document topic segmentation

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
A topical link model for community discovery in textual interaction graph

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Topic-level social network search

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Accounting for data dependencies within a hierarchical dirichlet process mixture model

Proceedings of the 20th ACM international conference on Information and knowledge management

Functional dirichlet process

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Topic models such as latent Dirichlet allocation (LDA) and hierarchical Dirichlet processes (HDP) are simple solutions to discover topics from a set of unannotated documents. While they are simple and popular, a major shortcoming of LDA and HDP is that they do not organize the topics into a hierarchical structure which is naturally found in many datasets. We introduce the recursive Chinese restaurant process (rCRP) and a nonparametric topic model with rCRP as a prior for discovering a hierarchical topic structure with unbounded depth and width. Unlike previous models for discovering topic hierarchies, rCRP allows the documents to be generated from a mixture over the entire set of topics in the hierarchy. We apply rCRP to a corpus of New York Times articles, a dataset of MovieLens ratings, and a set of Wikipedia articles and show the discovered topic hierarchies. We compare the predictive power of rCRP with LDA, HDP, and nested Chinese restaurant process (nCRP) using heldout likelihood to show that rCRP outperforms the others. We suggest two metrics that quantify the characteristics of a topic hierarchy to compare the discovered topic hierarchies of rCRP and nCRP. The results show that rCRP discovers a hierarchy in which the topics become more specialized toward the leaves, and topics in the immediate family exhibit more affinity than topics beyond the immediate family.