Data point selection for cross-language adaptation of dependency parsers

Authors:
Anders Søgaard
Affiliations:
University of Copenhagen, Njalsgade, Copenhagen S
Venue:
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2
Year:
2011

Citing 7
Cited 4

Corpus-based induction of syntactic structure: models of dependency and constituency

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Online large-margin training of dependency parsers

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Data-driven dependency parsing of new languages using incomplete and noisy training data

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
Dependency grammar induction via bitext projection constraints

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Parser adaptation and projection with quasi-synchronous grammar features

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Sparsity in dependency grammar induction

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Using universal linguistic knowledge to guide grammar induction

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

Multi-source transfer of delexicalized dependency parsers

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Data point selection for self-training

SPMRL '11 Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages
Cross-lingual word clusters for direct transfer of linguistic structure

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Selective sharing for multilingual dependency parsing

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider a very simple, yet effective, approach to cross language adaptation of dependency parsers. We first remove lexical items from the treebanks and map part-of-speech tags into a common tagset. We then train a language model on tag sequences in otherwise unlabeled target data and rank labeled source data by perplexity per word of tag sequences from less similar to most similar to the target. We then train our target language parser on the most similar data points in the source labeled data. The strategy achieves much better results than a non-adapted baseline and state-of-the-art unsupervised dependency parsing, and results are comparable to more complex projection-based cross language adaptation algorithms.