Notes on the evaluation of dependency parsers obtained through cross-lingual projection

Authors:
Kathrin Spreyer
Affiliations:
University of Potsdam
Venue:
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
Year:
2010

Citing 8
Cited 0

A systematic comparison of various statistical alignment models

Computational Linguistics
Building a large annotated corpus of English: the penn treebank

Computational Linguistics - Special issue on using large corpora: II
Inducing multilingual POS taggers and NP bracketers via robust projection across aligned corpora

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Bootstrapping parsers via syntactic projection across parallel texts

Natural Language Engineering
Non-projective dependency parsing using spanning tree algorithms

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
CoNLL-X shared task on multilingual dependency parsing

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Labeled pseudo-projective dependency parsing with support vector machines

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Data-driven dependency parsing of new languages using incomplete and noisy training data

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we address methodological issues in the evaluation of a projection-based framework for dependency parsing in which annotations for a source language are transfered to a target language using word alignments in a parallel corpus. The projected trees then constitute the training data for a data-driven parser in the target language. We discuss two problems that arise in the evaluation of such cross-lingual approaches. First, the annotation scheme underlying the source language annotations - and hence the projected target annotations and predictions of the parser derived from them - is likely to differ from previously existing gold standard test sets devised specifically for the target language. Second, the standard procedure of cross-validation cannot be performed in the absence of parallel gold standard annotations, so an alternative method has to be used to assess the generalization capabilities of the projected parsers.