Learning to detect english and hungarian light verb constructions

Authors:
Veronika Vincze;István Nagy T.;János Zsibrita
Affiliations:
Hungarian Academy of Sciences, Hungary;University of Szeged, Hungary;University of Szeged, Hungary
Venue:
ACM Transactions on Speech and Language Processing (TSLP) - Special issue on multiword expressions: From theory to practice and use, part 1
Year:
2013

Citing 26
Cited 0

Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Multiword Expressions: A Pain in the Neck for NLP

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Accurate unlexicalized parsing

ACL '03 Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1
Enriching the knowledge sources used in a maximum entropy part-of-speech tagger

EMNLP '00 Proceedings of the 2000 Joint SIGDAT conference on Empirical methods in natural language processing and very large corpora: held in conjunction with the 38th Annual Meeting of the Association for Computational Linguistics - Volume 13
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Multiword unit hybrid extraction

MWE '03 Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment - Volume 18
Extracting multiword expressions with a semantic tagger

MWE '03 Proceedings of the ACL 2003 workshop on Multiword expressions: analysis, acquisition and treatment - Volume 18
Incorporating non-local information into information extraction systems by Gibbs sampling

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Statistical measures of the semi-productivity of light verb constructions

MWE '04 Proceedings of the Workshop on Multiword Expressions: Integrating Processing
A measure of syntactic flexibility for automatically identifying multiword expressions in corpora

MWE '07 Proceedings of the Workshop on a Broader Perspective on Multiword Expressions
Distinguishing subtypes of multiword expressions using linguistically-motivated statistical measures

MWE '07 Proceedings of the Workshop on a Broader Perspective on Multiword Expressions
Semantics-based multiword expression extraction

MWE '07 Proceedings of the Workshop on a Broader Perspective on Multiword Expressions
Pulling their weight: exploiting syntactic forms for the automatic identification of idiomatic expressions in context

MWE '07 Proceedings of the Workshop on a Broader Perspective on Multiword Expressions
Verb noun construction MWE token supervised classification

MWE '09 Proceedings of the Workshop on Multiword Expressions: Identification, Interpretation, Disambiguation and Applications
Collocation extraction beyond the independence assumption

ACLShort '10 Proceedings of the ACL 2010 Conference Short Papers
Complex predicates annotation in a corpus of Portuguese

LAW IV '10 Proceedings of the Fourth Linguistic Annotation Workshop
Cross-lingual variation of light verb constructions: using parallel corpora and automatic alignment for linguistic research

NLPLING '10 Proceedings of the 2010 Workshop on NLP and Linguistics: Finding the Common Ground
Hungarian corpus of light verb constructions

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Multiword expressions in the wild?: the mwetoolkit comes in handy

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations
Automatic extraction of NV expressions in Basque: basic issues on cooccurrence techniques

MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Learning English light verb constructions: contextual or statistical

MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Identifying and analyzing Brazilian Portuguese complex predicates

MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Stepwise mining of multi-word expressions in Hindi

MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
Detecting noun compounds and light verb constructions: a contrastive study

MWE '11 Proceedings of the Workshop on Multiword Expressions: from Parsing and Generation to the Real World
A multilingual named entity recognition system using boosting and c4.5 decision tree learning algorithms

DS'06 Proceedings of the 9th international conference on Discovery Science
Cross-genre and cross-domain detection of semantic uncertainty

Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Light verb constructions consist of a verbal and a nominal component, where the noun preserves its original meaning while the verb has lost it (to some degree). They are syntactically flexible and their meaning can only be partially computed on the basis of the meaning of their parts, thus they require special treatment in natural language processing. For this purpose, the first step is to identify light verb constructions. In this study, we present our conditional random fields-based tool—called FXTagger—for identifying light verb constructions. The flexibility of the tool is demonstrated on two, typologically different, languages, namely, English and Hungarian. As earlier studies labeled different linguistic phenomena as light verb constructions, we first present a linguistics-based classification of light verb constructions and then show that FXTagger is able to identify different classes of light verb constructions in both languages. Different types of texts may contain different types of light verb constructions; moreover, the frequency of light verb constructions may differ from domain to domain. Hence we focus on the portability of models trained on different corpora, and we also investigate the effect of simple domain adaptation techniques to reduce the gap between the domains. Our results show that in spite of domain specificities, out-domain data can also contribute to the successful LVC detection in all domains.