Incorporating domain knowledge into topic modeling via Dirichlet Forest priors

Authors:
David Andrzejewski;Xiaojin Zhu;Mark Craven
Affiliations:
University of Wisconsin-Madison, Madison, WI;University of Wisconsin-Madison, Madison, WI;University of Wisconsin-Madison, Madison, WI
Venue:
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Year:
2009

Citing 5
Cited 34

The number of maximal independent sets in a connected graph

Discrete Mathematics
Latent dirichlet allocation

The Journal of Machine Learning Research
Pachinko allocation: DAG-structured mixture models of topic correlations

ICML '06 Proceedings of the 23rd international conference on Machine learning
Constrained Clustering: Advances in Algorithms, Theory, and Applications

Constrained Clustering: Advances in Algorithms, Theory, and Applications
May all your wishes come true: a study of wishes and how to recognize them

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics

Evaluating topic models for digital libraries

Proceedings of the 10th annual joint conference on Digital libraries
Holistic sentiment analysis across languages: multilingual supervised latent Dirichlet allocation

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Exploiting conversation structure in unsupervised topic segmentation for emails

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Grouping product features using semi-supervised learning with soft-constraints

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Clustering product features for opinion mining

Proceedings of the fourth ACM international conference on Web search and data mining
Interactive topic modeling

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Constrained LDA for grouping product features in opinion mining

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Latent topic feedback for information retrieval

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Unsupervised segmentation of bibliographic elements with latent permutations

WISS'10 Proceedings of the 2010 international conference on Web information systems engineering
A Twofold-LDA Model for Customer Review Analysis

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
Optimizing semantic coherence in topic models

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Pareto charting using multifield freestyle text data applied to Toyota Camry user reviews

Applied Stochastic Models in Business and Industry
A framework for incorporating general domain knowledge into latent Dirichlet allocation using first-order logic

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Improving topic evaluation using conceptual knowledge

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Mining contentions from discussions and debates

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A hybrid semi-supervised topic model

IScIDE'11 Proceedings of the Second Sino-foreign-interchange conference on Intelligent Science and Intelligent Data Engineering
Incorporating lexical priors into topic models

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Topic extraction based on prior knowledge obtained from target documents

ACL '12 Proceedings of ACL 2012 Student Research Workshop
Modeling review comments

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Aspect extraction through semi-supervised modeling

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Efficient tree-based topic modeling

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers - Volume 2
The generalized dirichlet distribution in enhanced topic detection

Proceedings of the 21st ACM international conference on Information and knowledge management
Modeling semantic relations between visual attributes and object categories via dirichlet forest prior

Proceedings of the 21st ACM international conference on Information and knowledge management
Incorporating word correlation into tag-topic model for semantic knowledge acquisition

Proceedings of the 21st ACM international conference on Information and knowledge management
Unsupervised Segmentation of Bibliographic Elements with Latent Permutations

International Journal of Organizational and Collective Intelligence
Optimizing temporal topic segmentation for intelligent text visualization

Proceedings of the 2013 international conference on Intelligent user interfaces
Intuitive Topic Discovery by Incorporating Word-Pair's Connection Into LDA

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01
TV program detection in tweets

Proceedings of the 11th european conference on Interactive TV and video
Learning to explore scientific workflow repositories

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Discovering coherent topics using general knowledge

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Leveraging multi-domain prior knowledge in topic models

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Topic segmentation and labeling in asynchronous conversations

Journal of Artificial Intelligence Research
Activity-based topic discovery

Web Intelligence and Agent Systems
A jointly distributed semi-supervised topic model

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Users of topic modeling methods often have knowledge about the composition of words that should have high or low probability in various topics. We incorporate such domain knowledge using a novel Dirichlet Forest prior in a Latent Dirichlet Allocation framework. The prior is a mixture of Dirichlet tree distributions with special structures. We present its construction, and inference via collapsed Gibbs sampling. Experiments on synthetic and real datasets demonstrate our model's ability to follow and generalize beyond user-specified domain knowledge.