Applying Machine Learning to Chinese Entity Detection and Tracking

  • Authors:
  • Donglei Qian;Wenjie Li;Chunfa Yuan;Qin Lu;Mingli Wu

  • Affiliations:
  • Department of Computing, The Hong Kong Polytechnic University, Hong Kong and Department of Computer Science and Technology, Tsinghua University, China;Department of Computing, The Hong Kong Polytechnic University, Hong Kong;Department of Computer Science and Technology, Tsinghua University, China;Department of Computing, The Hong Kong Polytechnic University, Hong Kong;Department of Computing, The Hong Kong Polytechnic University, Hong Kong

  • Venue:
  • CICLing '07 Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a Chinese entity detection and tracking system that takes advantages of character-based models and machine learning approaches. An entity here is defined as a link of all its mentions in text together with the associated attributes. Entity mentions of different types normally exhibit quite different linguistic patterns. Six separate Conditional Random Fields (CRF) models that incorporate character N-gram and word knowledge features are built to detect the extent and the head of three types of mentions, namely named, nominal and pronominal mentions. For each type of mentions, attributes are identified by Support Vector Machine (SVM) classifiers which take mention heads and their context as classification features. Mentions can then be merged into a unified entity representation by examining their attributes and connections in a rule-based coreference resolution process. The system is evaluated on ACE 2005 corpus and achieves competitive results.