Topic models for taxonomies

Authors:
Anton Bakalov;Andrew McCallum;Hanna Wallach;David Mimno
Affiliations:
University of Massachusetts Amherst, Amherst, MA, USA;University of Massachusetts Amherst, Amherst, MA, USA;University of Massachusetts Amherst, Amherst, MA, USA;Princeton University, Princeton, NJ, USA
Venue:
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Year:
2012

Citing 8
Cited 1

Latent dirichlet allocation

The Journal of Machine Learning Research
The author-topic model for authors and documents

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Mixtures of hierarchical topics with Pachinko allocation

Proceedings of the 24th international conference on Machine learning
Evaluation methods for topic models

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Using Topic Models to Interpret MEDLINE's Medical Subject Headings

AI '09 Proceedings of the 22nd Australasian Joint Conference on Advances in Artificial Intelligence
Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Holistic sentiment analysis across languages: multilingual supervised latent Dirichlet allocation

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Improving verbose queries using subset distribution

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management

A PAM-based ontology concept and hierarchy learning method

Journal of Information Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

Concept taxonomies such as MeSH, the ACM Computing Classification System, and the NY Times Subject Headings are frequently used to help organize data. They typically consist of a set of concept names organized in a hierarchy. However, these names and structure are often not sufficient to fully capture the intended meaning of a taxonomy node, and particularly non-experts may have difficulty navigating and placing data into the taxonomy. This paper introduces two semi-supervised topic models that automatically augment a given taxonomy with many additional keywords by leveraging a corpus of multi-labeled documents. Our experiments show that users find the topics beneficial for taxonomy interpretation, substantially increasing their cataloging accuracy. Furthermore, the models provide a better information rate compared to Labeled LDA.