Multiword Expressions: A Pain in the Neck for NLP
CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
An expert lexicon approach to identifying English phrasal verbs
ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Verb-particle constructions and lexical resources
MWE '03 Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment - Volume 18
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Classifying particle semantics in English verb-particle constructions
MWE '06 Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Learning English light verb constructions: contextual or statistical
MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Hi-index | 0.00 |
In this paper, we investigate a full-fledged supervised machine learning framework for identifying English phrasal verbs in a given context. We concentrate on those that we define as the most confusing phrasal verbs, in the sense that they are the most commonly used ones whose occurrence may correspond either to a true phrasal verb or an alignment of a simple verb with a preposition. We construct a benchmark dataset with 1,348 sentences from BNC, annotated via an Internet crowdsourcing platform. This dataset is further split into two groups, more idiomatic group which consists of those that tend to be used as a true phrasal verb and more compositional group which tends to be used either way. We build a discriminative classifier with easily available lexical and syntactic features and test it over the datasets. The classifier overall achieves 79.4% accuracy, 41.1% error deduction compared to the corpus majority baseline 65%. However, it is even more interesting to discover that the classifier learns more from the more compositional examples than those idiomatic ones.