Sorting out the most confusing English phrasal verbs

Authors:
Yuancheng Tu;Dan Roth
Affiliations:
University of Illinois;University of Illinois
Venue:
SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Year:
2012

Citing 7
Cited 0

Multiword Expressions: A Pain in the Neck for NLP

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
An expert lexicon approach to identifying English phrasal verbs

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Verb-particle constructions and lexical resources

MWE '03 Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment - Volume 18
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Classifying particle semantics in English verb-particle constructions

MWE '06 Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)
Learning English light verb constructions: contextual or statistical

MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we investigate a full-fledged supervised machine learning framework for identifying English phrasal verbs in a given context. We concentrate on those that we define as the most confusing phrasal verbs, in the sense that they are the most commonly used ones whose occurrence may correspond either to a true phrasal verb or an alignment of a simple verb with a preposition. We construct a benchmark dataset with 1,348 sentences from BNC, annotated via an Internet crowdsourcing platform. This dataset is further split into two groups, more idiomatic group which consists of those that tend to be used as a true phrasal verb and more compositional group which tends to be used either way. We build a discriminative classifier with easily available lexical and syntactic features and test it over the datasets. The classifier overall achieves 79.4% accuracy, 41.1% error deduction compared to the corpus majority baseline 65%. However, it is even more interesting to discover that the classifier learns more from the more compositional examples than those idiomatic ones.