The Journal of Machine Learning Research
Probabilistic author-topic models for information discovery
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Mixtures of hierarchical topics with Pachinko allocation
Proceedings of the 24th international conference on Machine learning
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning
ISWC '08 Proceedings of the 7th International Conference on The Semantic Web
Modeling Documents by Combining Semantic Concepts with Unsupervised Statistical Learning
ISWC '08 Proceedings of the 7th International Conference on The Semantic Web
Using Topic Models to Interpret MEDLINE's Medical Subject Headings
AI '09 Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence
Semantic topic models: combining word distributional statistics and dictionary definitions
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
SSHLDA: a semi-supervised hierarchical topic model
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Hierarchical topic integration through semi-supervised hierarchical topic modeling
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
Statistical topic models provide a general data-driven framework for automated discovery of high-level knowledge from large collections of text documents. While topic models can potentially discover a broad range of themes in a data set, the interpretability of the learned topics is not always ideal. Human-defined concepts, on the other hand, tend to be semantically richer due to careful selection of words to define concepts but they tend not to cover the themes in a data set exhaustively. In this paper, we propose a probabilistic framework to combine a hierarchy of human-defined semantic concepts with statistical topic models to seek the best of both worlds. Experimental results using two different sources of concept hierarchies and two collections of text documents indicate that this combination leads to systematic improvements in the quality of the associated language models as well as enabling new techniques for inferring and visualizing the semantics of a document.