Class-based n-gram models of natural language
Computational Linguistics
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Improved Named Entity Translation and Bilingual Named Entity Extraction
ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Proper name translation in cross-language information retrieval
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Learning translations of named-entity phrases from parallel corpora
EACL '03 Proceedings of the tenth conference on European chapter of the Association for Computational Linguistics - Volume 1
Improved statistical alignment models
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
The Journal of Machine Learning Research
Analysis and repair of name tagger errors
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Optimizing Chinese word segmentation for machine translation performance
StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
On jointly recognizing and aligning bilingual named entities
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
MT error detection for cross-lingual question answering
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Cross-lingual slot filling from comparable corpora
BUCC '11 Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
Curating and contextualizing Twitter stories to assist with social newsgathering
Proceedings of the 2013 international conference on Intelligent user interfaces
Cross-Lingual Annotation Projection for Weakly-Supervised Relation Extraction
ACM Transactions on Asian Language Information Processing (TALIP)
Hi-index | 0.00 |
Traditional isolated monolingual name taggers tend to yield inconsistent results across two languages. In this paper, we propose two novel approaches to jointly and consistently extract names from parallel corpora. The first approach uses standard linear-chain Conditional Random Fields (CRFs) as the learning framework, incorporating cross-lingual features propagated between two languages. The second approach is based on a joint CRFs model to jointly decode sentence pairs, incorporating bilingual factors based on word alignment. Experiments on Chinese-English parallel corpora demonstrated that the proposed methods significantly outperformed monolingual name taggers, were robust to automatic alignment noise and achieved state-of-the-art performance. With only 20%of the training data, our proposed methods can already achieve better performance compared to the baseline learned from the whole training set.1