Joint bilingual name tagging for parallel corpora

Authors:
Qi Li;Haibo Li;Heng Ji;Wen Wang;Jing Zheng;Fei Huang
Affiliations:
City University of New York, New York City, NY, USA;City University of New York, New York City, NY, USA;City University of New York, New York City, NY, USA;SRI International, Menlo Park, CA, USA;SRI International, Menlo Park, CA, USA;IBM T.J. Watson Research Center, Yorktown Heights, USA
Venue:
Proceedings of the 21st ACM international conference on Information and knowledge management
Year:
2012

Citing 12
Cited 2

Class-based n-gram models of natural language

Computational Linguistics
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Improved Named Entity Translation and Bilingual Named Entity Extraction

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Proper name translation in cross-language information retrieval

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Learning translations of named-entity phrases from parallel corpora

EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Improved statistical alignment models

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data

The Journal of Machine Learning Research
Analysis and repair of name tagger errors

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Optimizing Chinese word segmentation for machine translation performance

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
On jointly recognizing and aligning bilingual named entities

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
MT error detection for cross-lingual question answering

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Cross-lingual slot filling from comparable corpora

BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web

Curating and contextualizing Twitter stories to assist with social newsgathering

Proceedings of the 2013 international conference on Intelligent user interfaces
Cross-Lingual Annotation Projection for Weakly-Supervised Relation Extraction

ACM Transactions on Asian Language Information Processing (TALIP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional isolated monolingual name taggers tend to yield inconsistent results across two languages. In this paper, we propose two novel approaches to jointly and consistently extract names from parallel corpora. The first approach uses standard linear-chain Conditional Random Fields (CRFs) as the learning framework, incorporating cross-lingual features propagated between two languages. The second approach is based on a joint CRFs model to jointly decode sentence pairs, incorporating bilingual factors based on word alignment. Experiments on Chinese-English parallel corpora demonstrated that the proposed methods significantly outperformed monolingual name taggers, were robust to automatic alignment noise and achieved state-of-the-art performance. With only 20%of the training data, our proposed methods can already achieve better performance compared to the baseline learned from the whole training set.1