Multiword expressions in statistical dependency parsing

  • Authors:
  • Gülşen Eryiğit;Tugay İlbay;Ozan Arkan Can

  • Affiliations:
  • Istanbul Technical University Istanbul, Turkey;Istanbul Technical University Istanbul, Turkey;Istanbul Technical University Istanbul, Turkey

  • Venue:
  • SPMRL '11 Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we investigated the impact of extracting different types of multiword expressions (MWEs) in improving the accuracy of a data-driven dependency parser for a morphologically rich language (Turkish). We showed that in the training stage, the unification of MWEs of a certain type, namely compound verb and noun formations, has a negative effect on parsing accuracy by increasing the lexical sparsity. Our results gave a statistically significant improvement by using a variant of the treebank excluding this MWE type in the training stage. Our extrinsic evaluation of an ideal MWE recognizer (for only extracting MWEs of type named entities, duplications, numbers, dates and some predefined list of compound prepositions) showed that the preprocessing of the test data would improve the labeled parsing accuracy by 1.5%.