A hybrid approach for multiword expression identification

Authors:
Carlos Ramisch;Helena de Medeiros Caseli;Aline Villavicencio;André Machado;Maria José Finatto
Affiliations:
GETALP/LIG, University of Grenoble, (France);Department of Computer Science, Federal University of São Carlos, (Brazil);Institute of Informatics, Federal University of Rio Grande do Sul, (Brazil);Institute of Informatics, Federal University of Rio Grande do Sul, (Brazil);Institute of Language and Linguistics, Federal University of Rio Grande do Sul, (Brazil)
Venue:
PROPOR'10 Proceedings of the 9th international conference on Computational Processing of the Portuguese Language
Year:
2010

Citing 12
Cited 3

Multiword Expressions: A Pain in the Neck for NLP

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Improved statistical alignment models

ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
On the Automatic Learning of Bilingual Resources: Some Relevant Factors for Machine Translation

SBIA '08 Proceedings of the 19th Brazilian Symposium on Artificial Intelligence: Advances in Artificial Intelligence
Unsupervised type and token identification of idiomatic expressions

Computational Linguistics
Picking them up and figuring them out: verb-particle constructions, noise and idiomaticity

CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
Semantics-based multiword expression extraction

MWE '07 Proceedings of the Workshop on a Broader Perspective on Multiword Expressions
Deep lexical acquisition of verb-particle constructions

Computer Speech and Language
Using small random samples for the manual evaluation of statistical association measures

Computer Speech and Language
Statistically-driven alignment-based multiword expression identification for technical domains

MWE '09 Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications
The design, implementation, and use of the Ngram statistics package

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Identification of Multiword Expressions in Technical Domains: Investigating Statistical and Alignment-Based Approaches

STIL '09 Proceedings of the 2009 Seventh Brazilian Symposium in Information and Human Language Technology
Open-Source portuguese–spanish machine translation

PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language

Multiword expressions in the wild?: the mwetoolkit comes in handy

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations
Two-Word Collocation Extraction Using Monolingual Word Alignment Method

ACM Transactions on Intelligent Systems and Technology (TIST)
Identification of multi-word expressions by combining multiple linguistic information sources

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Considerable attention has been given to the problem of Multiword Expression (MWE) identification and treatment, for NLP tasks like parsing and generation, to improve the quality of results. Statistical methods have been often employed for MWE identification, as an inexpensive and language independent way of finding co-occurrence patterns. On the other hand, more linguistically motivated methods for identification, which employ information such as POS filters and lexical alignment between languages, can produce more targeted candidate lists. In this paper we propose a hybrid approach that combines the strenghts of different sources of information using a machine learning algorithm to produce more robust and precise results. Automatic evaluation on gold standards shows that the performance of our hybrid method is superior to the individual results of statistical and alignment-based MWE extraction approaches for Portuguese and for English. This method can be used to aid lexicographic work by providing a more targeted MWE candidate list.