Joint bilingual name tagging for parallel corpora

  • Authors:
  • Qi Li;Haibo Li;Heng Ji;Wen Wang;Jing Zheng;Fei Huang

  • Affiliations:
  • City University of New York, New York City, NY, USA;City University of New York, New York City, NY, USA;City University of New York, New York City, NY, USA;SRI International, Menlo Park, CA, USA;SRI International, Menlo Park, CA, USA;IBM T.J. Watson Research Center, Yorktown Heights, USA

  • Venue:
  • Proceedings of the 21st ACM international conference on Information and knowledge management
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Traditional isolated monolingual name taggers tend to yield inconsistent results across two languages. In this paper, we propose two novel approaches to jointly and consistently extract names from parallel corpora. The first approach uses standard linear-chain Conditional Random Fields (CRFs) as the learning framework, incorporating cross-lingual features propagated between two languages. The second approach is based on a joint CRFs model to jointly decode sentence pairs, incorporating bilingual factors based on word alignment. Experiments on Chinese-English parallel corpora demonstrated that the proposed methods significantly outperformed monolingual name taggers, were robust to automatic alignment noise and achieved state-of-the-art performance. With only 20%of the training data, our proposed methods can already achieve better performance compared to the baseline learned from the whole training set.1