Automatic domain terminology extraction using graph mutual reinforcement

Authors:
Jingjing Kang;Xiaoyong Du;Tao Liu;He Hu
Affiliations:
Key Labs of Data Engineering and Knowledge Engineering, Beijing, China and School of Information, Renmin University of China, Beijing, China;Key Labs of Data Engineering and Knowledge Engineering, Beijing, China and School of Information, Renmin University of China, Beijing, China;Key Labs of Data Engineering and Knowledge Engineering, Beijing, China and School of Information, Renmin University of China, Beijing, China;Key Labs of Data Engineering and Knowledge Engineering, Beijing, China and School of Information, Renmin University of China, Beijing, China
Venue:
WAIM'10 Proceedings of the 11th international conference on Web-age information management
Year:
2010

Citing 7
Cited 0

Academic careers for experimental computer scientists and engineers

Communications of the ACM
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Object-level ranking: bringing order to Web objects

WWW '05 Proceedings of the 14th international conference on World Wide Web
Reviewing and Evaluating Automatic Term Recognition Techniques

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
On the Use of Domain Terms in Source Code

ICPC '08 Proceedings of the 2008 The 16th IEEE International Conference on Program Comprehension
IRank: A Term-Based Innovation Ranking System for Conferences and Scholars

APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Improving the extraction of bilingual terminology from Wikipedia

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Information Extraction (IE) aims at mining knowledge from unstructured data. Terminology extraction is one of crucial subtasks in IE. In this paper, we propose a novel approach of domain terminology extraction based on ranking, according to linkage of authors, papers and conferences in domain proceedings. Candidate terms are extracted by statistical methods and then ranked by the values of importance derived from mutual reinforcement result in the author-paper-conference graph. Furthermore, we integrate our approach with several classical termhood-based methods including C-value and inverse document frequency. The presented approach does not require any training data, and can be extended to other domains. Experimental results show that our approach outperforms several competitive methods.