A joint model to identify and align bilingual named entities

  • Authors:
  • Yufeng Chen;Chengqing Zong;Keh-Yih Su

  • Affiliations:
  • National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences;National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences;Behavior Design Corporation

  • Venue:
  • Computational Linguistics
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this article, an integrated model is derived that jointly identifies and aligns bilingual named entities NEs between Chinese and English. The model is motivated by the following observations: 1 whether an NE is translated semantically or phonetically depends greatly on its entity type, 2 entities within an aligned pair should share the same type, and 3 the initially detected NEs can act as anchors and provide further information while selecting NE candidates. Based on these observations, this article proposes a translation mode ratio feature defined as the proportion of NE internal tokens that are semantically translated, enforces an entity type consistency constraint, and utilizes additional new NE likelihoods based on the initially detected NE anchors. Experiments show that this novel method significantly outperforms the baseline. The type-insensitive F-score of identified NE pairs increases from 78.4% to 88.0% 12.2% relative improvement in our Chinese-English NE alignment task, and the type-sensitive F-score increases from 68.4% to 83.0% 21.3% relative improvement. Furthermore, the proposed model demonstrates its robustness when it is tested across different domains. Finally, when semi-supervised learning is conducted to train the adopted English NE recognition model, the proposed model also significantly boosts the English NE recognition type-sensitive F-score.