Nudging the envelope of direct transfer methods for multilingual named entity recognition

Authors:
Oscar Täckström
Affiliations:
Uppsala University, Sweden
Venue:
WILS '12 Proceedings of the NAACL-HLT Workshop on the Induction of Linguistic Structure
Year:
2012

Citing 24
Cited 0

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Message Understanding Conference-6: a brief history

COLING '96 Proceedings of the 16th conference on Computational linguistics - Volume 1
Inducing multilingual text analysis tools via robust projection across aligned corpora

HLT '01 Proceedings of the first international conference on Human language technology research
An unsupervised method for word sense tagging using parallel corpora

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Bootstrapping parsers via syntactic projection across parallel texts

Natural Language Engineering
Introduction to the CoNLL-2002 shared task: language-independent named entity recognition

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
The multilingual entity task (MET) overview

TIPSTER '96 Proceedings of a workshop on held at Vienna, Virginia: May 6-8, 1996
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Grounding spatial named entities for information extraction and question answering

HLT-NAACL-GEOREF '03 Proceedings of the HLT-NAACL 2003 workshop on Analysis of geographic references - Volume 1
A high-performance semi-supervised learning method for text chunking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Effective self-training for parsing

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Improving machine translation quality with automatic named entity recognition

EAMT '03 Proceedings of the 7th International EAMT workshop on MT and other Language Technology Tools, Improving MT through other Language Technology Tools: Resources and Tools for Building MT
Adding more languages improves unsupervised multilingual part-of-speech tagging: a Bayesian non-parametric approach

NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Phrase clustering for discriminative learning

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Word representations: a simple and general method for semi-supervised learning

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Phylogenetic grammar induction

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Posterior Regularization for Structured Latent Variable Models

The Journal of Machine Learning Research
Frustratingly easy semi-supervised domain adaptation

DANLP 2010 Proceedings of the 2010 Workshop on Domain Adaptation for Natural Language Processing
Unsupervised part-of-speech tagging with bilingual graph-based projections

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Local and global algorithms for disambiguation to Wikipedia

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Unsupervised structure prediction with non-parallel multilingual guidance

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Multi-source transfer of delexicalized dependency parsers

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Training dependency parsers by jointly optimizing multiple objectives

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Cross-lingual word clusters for direct transfer of linguistic structure

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we study direct transfer methods for multilingual named entity recognition. Specifically, we extend the method recently proposed by Täckström et al. (2012), which is based on cross-lingual word cluster features. First, we show that by using multiple source languages, combined with self-training for target language adaptation, we can achieve significant improvements compared to using only single source direct transfer. Second, we investigate how the direct transfer system fares against a supervised target language system and conclude that between 8,000 and 16,000 word tokens need to be annotated in each target language to match the best direct transfer system. Finally, we show that we can significantly improve target language performance, even after annotating up to 64,000 tokens in the target language, by simply concatenating source and target language annotations.