Multiword Expressions: A Pain in the Neck for NLP
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
MWE '04 Proceedings of the Workshop on Multiword Expressions: Integrating Processing
Automated multiword expression prediction for grammar engineering
MWE '06 Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties
A measure of syntactic flexibility for automatically identifying multiword expressions in corpora
MWE '07 Proceedings of the Workshop on a Broader Perspective on Multiword Expressions
Semantics-based multiword expression extraction
MWE '07 Proceedings of the Workshop on a Broader Perspective on Multiword Expressions
MWE '07 Proceedings of the Workshop on a Broader Perspective on Multiword Expressions
Editorial: Introduction to the special issue on multiword expressions: Having a crack at a hard nut
Computer Speech and Language
Using small random samples for the manual evaluation of statistical association measures
Computer Speech and Language
The design, implementation, and use of the Ngram statistics package
CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Identifying multi-word expressions by leveraging morphological and syntactic idiosyncrasy
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Detecting multi-word expressions improves word sense disambiguation
MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
MWU-aware part-of-speech tagging with a CRF model and lexical resources
MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Extracting transfer rules for multiword expressions from parallel corpora
MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Hi-index | 0.00 |
The issue of internal variability of multiword expressions (MWEs) is crucial towards their identification and extraction in running text. We present a corpus-supported and computational study on Italian MWEs, aimed at defining an automatic method for modeling internal variation, exploiting frequency and part-of-speech (POS) information. We do so by deriving an XML-encoded lexicon of MWEs based on a manually compiled dictionary, which is then projected onto a a large corpus. Since a search for fixed forms suffers from low recall, while an unconstrained flexible search for lemmas yields a loss in precision, we suggest a procedure aimed at maximizing precision in the identification of MWEs within a flexible search. Our method builds on the idea that internal variability can be modelled via the novel introduction of variation patterns, which work over POS patterns, and can be used as working tools for controlling precision. We also compare the performance of variation patterns to that of association measures, and explore the possibility of using variation patterns in MWE extraction in addition to identification. Finally, we suggest that corpus-derived, pattern-related information can be included in the original MWE lexicon by means of an enriched coding and the creation of an XML-based repository of patterns.