Applying Machine Learning to Chinese Entity Detection and Tracking

Authors:
Donglei Qian;Wenjie Li;Chunfa Yuan;Qin Lu;Mingli Wu
Affiliations:
Department of Computing, The Hong Kong Polytechnic University, Hong Kong and Department of Computer Science and Technology, Tsinghua University, China;Department of Computing, The Hong Kong Polytechnic University, Hong Kong;Department of Computer Science and Technology, Tsinghua University, China;Department of Computing, The Hong Kong Polytechnic University, Hong Kong;Department of Computing, The Hong Kong Polytechnic University, Hong Kong
Venue:
CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
Year:
2009

Citing 11
Cited 2

An Algorithm that Learns What‘s in a Name

Machine Learning - Special issue on natural language learning
Text Categorization with Suport Vector Machines: Learning with Many Relevant Features

ECML '98 Proceedings of the 10th European Conference on Machine Learning
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A machine learning approach to coreference resolution of noun phrases

Computational Linguistics - Special issue on computational anaphora resolution
Design of the MUC-6 evaluation

MUC6 '95 Proceedings of the 6th conference on Message understanding
Shallow parsing with conditional random fields

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Named entity recognition with character-level models

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
A mention-synchronous coreference resolution algorithm based on the Bell tree

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Chinese named entity recognition based on multiple features

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Chinese named entity recognition based on multilevel linguistic features

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing
The use of SVM for chinese new word identification

IJCNLP'04 Proceedings of the First international joint conference on Natural Language Processing

Detecting, categorizing and clustering entity mentions in Chinese text

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Combining automatic acquisition of knowledge with machine learning approaches for multilingual temporal recognition and normalization

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a Chinese entity detection and tracking system that takes advantages of character-based models and machine learning approaches. An entity here is defined as a link of all its mentions in text together with the associated attributes. Entity mentions of different types normally exhibit quite different linguistic patterns. Six separate Conditional Random Fields (CRF) models that incorporate character N-gram and word knowledge features are built to detect the extent and the head of three types of mentions, namely named, nominal and pronominal mentions. For each type of mentions, attributes are identified by Support Vector Machine (SVM) classifiers which take mention heads and their context as classification features. Mentions can then be merged into a unified entity representation by examining their attributes and connections in a rule-based coreference resolution process. The system is evaluated on ACE 2005 corpus and achieves competitive results.