Discovering coherent topics using general knowledge

Authors:
Zhiyuan Chen;Arjun Mukherjee;Bing Liu;Meichun Hsu;Malu Castellanos;Riddhiman Ghosh
Affiliations:
University of Illinois at Chicago, Chicago, IL, USA;University of Illinois at Chicago, Chicago, IL, USA;University of Illinois at Chicago, Chicago, IL, USA;HP Labs, Palo Alto, CA, USA;HP Labs, Palo Alto, CA, USA;HP Labs, Palo Alto, CA, USA
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 31
Cited 0

WordNet: a lexical database for English

Communications of the ACM
Latent dirichlet allocation

The Journal of Machine Learning Research
Monte Carlo Statistical Methods (Springer Texts in Statistics)

Monte Carlo Statistical Methods (Springer Texts in Statistics)
Topic modeling with network regularization

Proceedings of the 17th international conference on World Wide Web
Modeling online reviews with multi-grain topic models

Proceedings of the 17th international conference on World Wide Web
Opinion integration through semi-supervised topic modeling

Proceedings of the 17th international conference on World Wide Web
Polya Urn Models

Polya Urn Models
Incorporating domain knowledge into topic modeling via Dirichlet Forest priors

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Simultaneously modeling semantics and structure of threaded discussions: a sparse coding approach and its applications

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
A Generic Approach to Topic Models

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Learning author-topic models from text corpora

ACM Transactions on Information Systems (TOIS)
Labeled LDA: a supervised topic model for credit attribution in multi-labeled corpora

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
Evaluating topic models for digital libraries

Proceedings of the 10th annual joint conference on Digital libraries
An unsupervised aspect-sentiment model for online reviews

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Cross-lingual latent topic extraction

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Topic models for word sense disambiguation and token-based idiom detection

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Jointly modeling aspects and opinions with a MaxEnt-LDA hybrid

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Aspect and sentiment unification model for online review analysis

Proceedings of the fourth ACM international conference on Web search and data mining
Interactive topic modeling

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Tracking trends: incorporating term volume into temporal topic models

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Multi-aspect Sentiment Analysis with Topic Models

ICDMW '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining Workshops
Optimizing semantic coherence in topic models

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Exploring supervised lda models for assigning attributes to adjective-noun phrases

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
A framework for incorporating general domain knowledge into latent Dirichlet allocation using first-order logic

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Context sensitive topic models for author influence in document networks

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Mining contentions from discussions and debates

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Incorporating lexical priors into topic models

EACL '12 Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics
Aspect extraction through semi-supervised modeling

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Exploring topic coherence over many models and many topics

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Leveraging multi-domain prior knowledge in topic models

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Topic models have been widely used to discover latent topics in text documents. However, they may produce topics that are not interpretable for an application. Researchers have proposed to incorporate prior domain knowledge into topic models to help produce coherent topics. The knowledge used in existing models is typically domain dependent and assumed to be correct. However, one key weakness of this knowledge-based approach is that it requires the user to know the domain very well and to be able to provide knowledge suitable for the domain, which is not always the case because in most real-life applications, the user wants to find what they do not know. In this paper, we propose a framework to leverage the general knowledge in topic models. Such knowledge is domain independent. Specifically, we use one form of general knowledge, i.e., lexical semantic relations of words such as synonyms, antonyms and adjective attributes, to help produce more coherent topics. However, there is a major obstacle, i.e., a word can have multiple meanings/senses and each meaning often has a different set of synonyms and antonyms. Not every meaning is suitable or correct for a domain. Wrong knowledge can result in poor quality topics. To deal with wrong knowledge, we propose a new model, called GK-LDA, which is able to effectively exploit the knowledge of lexical relations in dictionaries. To the best of our knowledge, GK-LDA is the first such model that can incorporate the domain independent knowledge. Our experiments using online product reviews show that GK-LDA performs significantly better than existing state-of-the-art models.