A novel topic model for automatic term extraction

  • Authors:
  • Sujian Li;Jiwei Li;Tao Song;Wenjie Li;Baobao Chang

  • Affiliations:
  • Instititute of Computational Linguistics, Peking University, Beijing, China;Instititute of Computational Linguistics, Peking University, Beijing, China;Instititute of Computational Linguistics, Peking University, Beijing, China;The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, China;Instititute of Computational Linguistics, Peking University, Beijing, China

  • Venue:
  • Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatic term extraction (ATE) aims at extracting domain-specific terms from a corpus of a certain domain. Termhood is one essential measure for judging whether a phrase is a term. Previous researches on termhood mainly depend on the word frequency information. In this paper, we propose to compute termhood based on semantic representation of words. A novel topic model, namely i-SWB, is developed to map the domain corpus into a latent semantic space, which is composed of some general topics, a background topic and a documents-specific topic. Experiments on four domains demonstrate that our approach outperforms the state-of-the-art ATE approaches.