Research on Domain Term Extraction Based on Conditional Random Fields

Authors:
Dequan Zheng;Tiejun Zhao;Jing Yang
Affiliations:
MOE-MS Key Laboratory of NLP and Speech, Harbin Institute of Technology, Harbin, China 150001;MOE-MS Key Laboratory of NLP and Speech, Harbin Institute of Technology, Harbin, China 150001;MOE-MS Key Laboratory of NLP and Speech, Harbin Institute of Technology, Harbin, China 150001
Venue:
ICCPOL '09 Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy
Year:
2009

Citing 6
Cited 0

Highlights: language- and domain-independent automatic indexing terms for abstracting

Journal of the American Society for Information Science
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Automatic corpus-based Thai word extraction with the c4.5 learning algorithm

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
A simple but powerful automatic term extraction method

COMPUTERM '02 COLING-02 on COMPUTERM 2002: second international workshop on computational terminology - Volume 14
Two-character Chinese word extraction based on hybrid of internal and contextual measures

SIGHAN '03 Proceedings of the second SIGHAN workshop on Chinese language processing - Volume 17
Domain-specific keyphrase extraction

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

Domain Term Extraction has an important significance in natural language processing, and it is widely applied in information retrieval, information extraction, data mining, machine translation and other information processing fields. In this paper, an automatic domain term extraction method is proposed based on condition random fields. We treat domain terms extraction as a sequence labeling problem, and terms' distribution characteristics as features of the CRF model. Then we used the CRF tool to train a template for the term extraction. Experimental results showed that the method is simple, with common domains, and good results were achieved. In the open test, the precision rate achieved was 79.63 %, recall rate was 73.54%, and F-measure was 76.46%.