A novel topic model for automatic term extraction

Authors:
Sujian Li;Jiwei Li;Tao Song;Wenjie Li;Baobao Chang
Affiliations:
Instititute of Computational Linguistics, Peking University, Beijing, China;Instititute of Computational Linguistics, Peking University, Beijing, China;Instititute of Computational Linguistics, Peking University, Beijing, China;The Hong Kong Polytechnic University Shenzhen Research Institute, Shenzhen, China;Instititute of Computational Linguistics, Peking University, Beijing, China
Venue:
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Year:
2013

Citing 2
Cited 0

Latent dirichlet allocation

The Journal of Machine Learning Research
A method of measuring term representativeness: baseline method using co-occurrence distribution

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic term extraction (ATE) aims at extracting domain-specific terms from a corpus of a certain domain. Termhood is one essential measure for judging whether a phrase is a term. Previous researches on termhood mainly depend on the word frequency information. In this paper, we propose to compute termhood based on semantic representation of words. A novel topic model, namely i-SWB, is developed to map the domain corpus into a latent semantic space, which is composed of some general topics, a background topic and a documents-specific topic. Experiments on four domains demonstrate that our approach outperforms the state-of-the-art ATE approaches.