Discovering coherent topics using general knowledge

  • Authors:
  • Zhiyuan Chen;Arjun Mukherjee;Bing Liu;Meichun Hsu;Malu Castellanos;Riddhiman Ghosh

  • Affiliations:
  • University of Illinois at Chicago, Chicago, IL, USA;University of Illinois at Chicago, Chicago, IL, USA;University of Illinois at Chicago, Chicago, IL, USA;HP Labs, Palo Alto, CA, USA;HP Labs, Palo Alto, CA, USA;HP Labs, Palo Alto, CA, USA

  • Venue:
  • Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Topic models have been widely used to discover latent topics in text documents. However, they may produce topics that are not interpretable for an application. Researchers have proposed to incorporate prior domain knowledge into topic models to help produce coherent topics. The knowledge used in existing models is typically domain dependent and assumed to be correct. However, one key weakness of this knowledge-based approach is that it requires the user to know the domain very well and to be able to provide knowledge suitable for the domain, which is not always the case because in most real-life applications, the user wants to find what they do not know. In this paper, we propose a framework to leverage the general knowledge in topic models. Such knowledge is domain independent. Specifically, we use one form of general knowledge, i.e., lexical semantic relations of words such as synonyms, antonyms and adjective attributes, to help produce more coherent topics. However, there is a major obstacle, i.e., a word can have multiple meanings/senses and each meaning often has a different set of synonyms and antonyms. Not every meaning is suitable or correct for a domain. Wrong knowledge can result in poor quality topics. To deal with wrong knowledge, we propose a new model, called GK-LDA, which is able to effectively exploit the knowledge of lexical relations in dictionaries. To the best of our knowledge, GK-LDA is the first such model that can incorporate the domain independent knowledge. Our experiments using online product reviews show that GK-LDA performs significantly better than existing state-of-the-art models.