Multiword expressions in statistical dependency parsing

Authors:
Gülşen Eryiğit;Tugay İlbay;Ozan Arkan Can
Affiliations:
Istanbul Technical University Istanbul, Turkey;Istanbul Technical University Istanbul, Turkey;Istanbul Technical University Istanbul, Turkey
Venue:
SPMRL '11 Proceedings of the Second Workshop on Statistical Parsing of Morphologically Rich Languages
Year:
2011

Citing 7
Cited 1

Dependency parsing of turkish

Computational Linguistics
CoNLL-X shared task on multilingual dependency parsing

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Labeled pseudo-projective dependency parsing with support vector machines

CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
Integrating morphology with multi-word expression processing in Turkish

MWE '04 Proceedings of the Workshop on Multiword Expressions: Integrating Processing
Can recognising multiword expressions improve shallow parsing?

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World

MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Decreasing lexical data sparsity in statistical syntactic parsing: experiments with named entities

MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World

Combining compound recognition and PCFG-LA parsing with word lattices and conditional random fields

ACM Transactions on Speech and Language Processing (TSLP) - Special issue on multiword expressions: From theory to practice and use, part 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we investigated the impact of extracting different types of multiword expressions (MWEs) in improving the accuracy of a data-driven dependency parser for a morphologically rich language (Turkish). We showed that in the training stage, the unification of MWEs of a certain type, namely compound verb and noun formations, has a negative effect on parsing accuracy by increasing the lexical sparsity. Our results gave a statistically significant improvement by using a variant of the treebank excluding this MWE type in the training stage. Our extrinsic evaluation of an ideal MWE recognizer (for only extracting MWEs of type named entities, duplications, numbers, dates and some predefined list of compound prepositions) showed that the preprocessing of the test data would improve the labeled parsing accuracy by 1.5%.