Multiword Expressions: A Pain in the Neck for NLP
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Improved statistical alignment models
ACL '00 Proceedings of the 38th Annual Meeting on Association for Computational Linguistics
On the Automatic Learning of Bilingual Resources: Some Relevant Factors for Machine Translation
SBIA '08 Proceedings of the 19th Brazilian Symposium on Artificial Intelligence: Advances in Artificial Intelligence
Unsupervised type and token identification of idiomatic expressions
Computational Linguistics
Picking them up and figuring them out: verb-particle constructions, noise and idiomaticity
CoNLL '08 Proceedings of the Twelfth Conference on Computational Natural Language Learning
Semantics-based multiword expression extraction
MWE '07 Proceedings of the Workshop on a Broader Perspective on Multiword Expressions
Deep lexical acquisition of verb-particle constructions
Computer Speech and Language
Using small random samples for the manual evaluation of statistical association measures
Computer Speech and Language
Statistically-driven alignment-based multiword expression identification for technical domains
MWE '09 Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications
The design, implementation, and use of the Ngram statistics package
CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
STIL '09 Proceedings of the 2009 Seventh Brazilian Symposium in Information and Human Language Technology
Open-Source portuguese–spanish machine translation
PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language
Multiword expressions in the wild?: the mwetoolkit comes in handy
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations
Two-Word Collocation Extraction Using Monolingual Word Alignment Method
ACM Transactions on Intelligent Systems and Technology (TIST)
Identification of multi-word expressions by combining multiple linguistic information sources
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Hi-index | 0.00 |
Considerable attention has been given to the problem of Multiword Expression (MWE) identification and treatment, for NLP tasks like parsing and generation, to improve the quality of results. Statistical methods have been often employed for MWE identification, as an inexpensive and language independent way of finding co-occurrence patterns. On the other hand, more linguistically motivated methods for identification, which employ information such as POS filters and lexical alignment between languages, can produce more targeted candidate lists. In this paper we propose a hybrid approach that combines the strenghts of different sources of information using a machine learning algorithm to produce more robust and precise results. Automatic evaluation on gold standards shows that the performance of our hybrid method is superior to the individual results of statistical and alignment-based MWE extraction approaches for Portuguese and for English. This method can be used to aid lexicographic work by providing a more targeted MWE candidate list.