Detecting, categorizing and clustering entity mentions in Chinese text

Authors:
Wenjie Li;Donglei Qian;Qin Lu;Chunfa Yuan
Affiliations:
The Hong Kong Polytechnic University, Hong Kong, Hong Kong;Tsinghua University, Beijing, China;The Hong Kong Polytechnic University, Hong Kong, Hong Kong;Tsinghua University, Beijing, China
Venue:
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Year:
2007

Citing 10
Cited 3

Efficient support vector classifiers for named entity recognition

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Improving machine learning approaches to coreference resolution

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Named entity recognition using an HMM-based chunk tagger

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Identifying and tracking entity mentions in a maximum entropy framework

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
HowtogetaChineseName(Entity): segmentation and combination issues

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing
Uniformly Stability of Impulsive BAM Neural Networks with Delays

ISDA '06 Proceedings of the Sixth International Conference on Intelligent Systems Design and Applications - Volume 01
A mention-synchronous coreference resolution algorithm based on the Bell tree

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Machine learning for coreference resolution: from local classification to global ranking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Applying Machine Learning to Chinese Entity Detection and Tracking

CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Chinese named entity recognition based on multilevel linguistic features

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing

Extracting Key Entities and Significant Events from Online Daily News

IDEAL '08 Proceedings of the 9th International Conference on Intelligent Data Engineering and Automated Learning
Using deep belief nets for Chinese named entity categorization

NEWS '10 Proceedings of the 2010 Named Entities Workshop
Developing Position Structure-Based Framework for Chinese Entity Relation Extraction

ACM Transactions on Asian Language Information Processing (TALIP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The work presented in this paper is motivated by the practical need for content extraction, and the available data source and evaluation benchmark from the ACE program. The Chinese Entity Detection and Recognition (EDR) task is of particular interest to us. This task presents us several language-independent and language-dependent challenges, e.g. rising from the complication of extraction targets and the problem of word segmentation, etc. In this paper, we propose a novel solution to alleviate the problems special in the task. Mention detection takes advantages of machine learning approaches and character-based models. It manipulates different types of entities being mentioned and different constitution units (i.e. extents and heads) separately. Mentions referring to the same entity are linked together by integrating most-specific-first and closest-first rule based pairwise clustering algorithms. Types of mentions and entities are determined by head-driven classification approaches. The implemented system achieves ACE value of 66.1 when evaluated on the EDR 2005 Chinese corpus, which has been one of the top-tier results. Alternative approaches to mention detection and clustering are also discussed and analyzed.