Data-driven dependency parsing of new languages using incomplete and noisy training data

  • Authors:
  • Kathrin Spreyer;Jonas Kuhn

  • Affiliations:
  • University of Potsdam, Germany;University of Potsdam, Germany

  • Venue:
  • CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a simple but very effective approach to identifying high-quality data in noisy data sets for structured problems like parsing, by greedily exploiting partial structures. We analyze our approach in an annotation projection framework for dependency trees, and show how dependency parsers from two different paradigms (graph-based and transition-based) can be trained on the resulting tree fragments. We train parsers for Dutch to evaluate our method and to investigate to which degree graph-based and transition-based parsers can benefit from incomplete training data. We find that partial correspondence projection gives rise to parsers that out-perform parsers trained on aggressively filtered data sets, and achieve unlabeled attachment scores that are only 5% behind the average UAS for Dutch in the CoNLL-X Shared Task on supervised parsing (Buchholz and Marsi, 2006).