A broad evaluation of techniques for automatic acquisition of multiword expressions

Authors:
Carlos Ramisch;Vitor De Araujo;Aline Villavicencio
Affiliations:
Federal University of Rio Grande do Sul (Brazil) and GETALP --- LIG, University of Grenoble (France);Federal University of Rio Grande do Sul (Brazil);Federal University of Rio Grande do Sul (Brazil)
Venue:
ACL '12 Proceedings of ACL 2012 Student Research Workshop
Year:
2012

Citing 9
Cited 1

Using LocalMaxs Algorithm for the Extraction of Contiguous and Non-contiguous Multiword Lexical Units

EPIA '99 Proceedings of the 9th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence
Multiword Expressions: A Pain in the Neck for NLP

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
An extensive empirical study of collocation extraction methods

ACLstudent '05 Proceedings of the ACL Student Research Workshop
Using small random samples for the manual evaluation of statistical association measures

Computer Speech and Language
Task-based evaluation of multiword expressions: a pilot study in statistical machine translation

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Multiword expressions in the wild?: the mwetoolkit comes in handy

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations
Detecting multi-word expressions improves word sense disambiguation

MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
jMWE: a Java toolkit for detecting multi-word expressions

MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
The ngram statistics package (Text::NSP): a flexible tool for identifying ngrams, collocations, and word associations

MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World

A generic framework for multiword expressions treatment: from acquisition to applications

ACL '12 Proceedings of ACL 2012 Student Research Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

Several approaches have been proposed for the automatic acquisition of multiword expressions from corpora. However, there is no agreement about which of them presents the best cost-benefit ratio, as they have been evaluated on distinct datasets and/or languages. To address this issue, we investigate these techniques analysing the following dimensions: expression type (compound nouns, phrasal verbs), language (English, French) and corpus size. Results show that these techniques tend to extract similar candidate lists with high recall (~ 80%) for nominals and high precision (~ 70%) for verbals. The use of association measures for candidate filtering is useful but some of them are more onerous and not significantly better than raw counts. We finish with an evaluation of flexibility and an indication of which technique is recommended for each language-type-size context.