Learning English light verb constructions: contextual or statistical

Authors:
Yuancheng Tu;Dan Roth
Affiliations:
University of Illinois;University of Illinois
Venue:
MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Year:
2011

Citing 13
Cited 3

Word association norms, mutual information, and lexicography

Computational Linguistics
Multiword Expressions: A Pain in the Neck for NLP

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Coarse-to-fine n-best parsing and MaxEnt discriminative reranking

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Measuring the relative compositionality of verb-noun (V-N) collocations by integrating features

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Unsupervised type and token identification of idiomatic expressions

Computational Linguistics
Statistical measures of the semi-productivity of light verb constructions

MWE '04 Proceedings of the Workshop on Multiword Expressions: Integrating Processing
Automatic identification of non-compositional multi-word expressions using latent semantic analysis

MWE '06 Proceedings of the Workshop on Multiword Expressions: Identifying and Exploiting Underlying Properties
Distinguishing subtypes of multiword expressions using linguistically-motivated statistical measures

MWE '07 Proceedings of the Workshop on a Broader Perspective on Multiword Expressions
Pulling their weight: exploiting syntactic forms for the automatic identification of idiomatic expressions in context

MWE '07 Proceedings of the Workshop on a Broader Perspective on Multiword Expressions
Automatically distinguishing literal and figurative usages of highly polysemous verbs

DeepLA '05 Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition
Diagnostics for determining compatibility in English support-verb-nominalization pairs

CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Cross-lingual variation of light verb constructions: using parallel corpora and automatic alignment for linguistic research

NLPLING '10 Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground
LIBSVM: A library for support vector machines

ACM Transactions on Intelligent Systems and Technology (TIST)

Sorting out the most confusing English phrasal verbs

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation
Lexical semantic factors in the acceptability of english support-verb-nominalization constructions

ACM Transactions on Speech and Language Processing (TSLP) - Special issue on multiword expressions: From theory to practice and use, part 1
Learning to detect english and hungarian light verb constructions

ACM Transactions on Speech and Language Processing (TSLP) - Special issue on multiword expressions: From theory to practice and use, part 1

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we investigate a supervised machine learning framework for automatically learning of English Light Verb Constructions (LVCs). Our system achieves an 86.3% accuracy with a baseline (chance) performance of 52.2% when trained with groups of either contextual or statistical features. In addition, we present an in-depth analysis of these contextual and statistical features and show that the system trained by these two types of cosmetically different features reaches similar performance empirically. However, in the situation where the surface structures of candidate LVCs are identical, the system trained with contextual features which contain information on surrounding words performs 16.7% better. In this study, we also construct a balanced benchmark dataset with 2,162 sentences from BNC for English LVCs. And this data set is publicly available and is also a useful computational resource for research on MWEs in general.