Research on Automatic Chinese Multi-word Term Extraction Based on Term Component

Authors:
Wei Kang;Zhifang Sui
Affiliations:
Institute of Computational Linguisitcs, Peking University, Peking, China 100871;Institute of Computational Linguisitcs, Peking University, Peking, China 100871
Venue:
ICCPOL '09 Proceedings of the 22nd International Conference on Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy
Year:
2009

Citing 11
Cited 0

Word association norms, mutual information, and lexicography

Computational Linguistics
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Termight: identifying and translating technical terminology

ANLC '94 Proceedings of the fourth conference on Applied natural language processing
A corpus-based approach to automatic compound extraction

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
A new method of N-gram statistics for large number of n and automatic extraction of words and phrases from large text data of Japanese

COLING '94 Proceedings of the 15th conference on Computational linguistics - Volume 1
Extracting nested collocations

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
A simple but powerful automatic term extraction method

COMPUTERM '02 COLING-02 on COMPUTERM 2002: second international workshop on computational terminology - Volume 14
A nonparametric method for extraction of candidate phrasal terms

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
You can't beat frequency (unless you use linguistic knowledge): a qualitative evaluation of association measures for collocation and term extraction

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Paradigmatic modifiability statistics for the extraction of complex multi-word terms

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an automatic Chinese multi-word term extraction method based on the unithood and the termhood measure. The unithood of the candidate term is measured by the strength of inner unity and marginal variety. Term component is taken into account to estimate the termhood. Inspired by the economical law of term generating, we propose two measures of a candidate term to be a true term: the first measure is based on domain speciality of term, and the second one is based on the similarity between a candidate and a template that contains structured information of terms. Experiments on I.T. domain and Medicine domain show that our method is effective and portable in different domains.