Towards automatic fine-grained semantic classification of verb-noun collocations

Authors:
Leo Wanner
Affiliations:
Computer Science Department, University of Stuttgart, Universitätsstr. 38, 70569 Stuttgart, Germany e-mail: wanner@informatik.uni-stuttgart.de
Venue:
Natural Language Engineering
Year:
2004

Citing 16
Cited 4

A fuzzy document retrieval system using the keyword connection matrix and a learning method

Fuzzy Sets and Systems - Special issue on applications of fuzzy systems theory, Iizuka '88
C4.5: programs for machine learning

C4.5: programs for machine learning
Selection and information: a class-based approach to lexical relationships

Selection and information: a class-based approach to lexical relationships
Foundations of statistical natural language processing

Foundations of statistical natural language processing
EuroWordNet: a multilingual database with lexical semantic networks

EuroWordNet: a multilingual database with lexical semantic networks
Retrieving collocations from text: Xtract

Computational Linguistics - Special issue on using large corpora: I
Automatic verb classification based on statistical distributions of argument structure

Computational Linguistics
Generalizing case frames using a thesaurus and the MDL principle

Computational Linguistics
Using semantic preferences to identify verbal participation in role switching alternations

NAACL 2000 Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference
Corpus-based method for automatic identification of support verbs for nominalizations

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
On learning more appropriate Selectional Restrictions

EACL '95 Proceedings of the seventh conference on European chapter of the Association for Computational Linguistics
Deterministic parsing of syntactic non-fluencies

ACL '83 Proceedings of the 21st annual meeting on Association for Computational Linguistics
Word association norms, mutual information, and lexicography

ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
Corpus-based linguistic indicators for aspectual classification

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Class-based probability estimation using a semantic hierarchy

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1

Distinguishing subtypes of multiword expressions using linguistically-motivated statistical measures

MWE '07 Proceedings of the Workshop on a Broader Perspective on Multiword Expressions
Automatically distinguishing literal and figurative usages of highly polysemous verbs

DeepLA '05 Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition
Supervised machine learning for predicting the meaning of verb-noun combinations in Spanish

MICAI'10 Proceedings of the 9th Mexican international conference on Artificial intelligence conference on Advances in soft computing: Part II
Supervised learning for semantic classification of Spanish collocations

MCPR'10 Proceedings of the 2nd Mexican conference on Pattern recognition: Advances in pattern recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Plain lists of collocations as provided to date by most approaches to automatic acquisition of collocations from corpora are useful as a resource for dictionary construction. However, their use is rather limited in the case of NLP-applications such as Text Generation, Machine Translation and Text Summarization if not enriched by information on the grammatical function of the collocation elements and by information on the semantics of the collocations as multiword units. In this article, we describe an approach to a fine-grained classification of verb-noun bigrams according to a semantically motivated typology of collocations and illustrate this with Spanish material. The typology of collocations that underlies our classification is based on verb-noun Lexical Functions (LFs) from the Explanatory Combinatorial Lexicology. In the first stage of the approach, the program learns the semantic features of each LF from training data. In the second stage, it examines the semantic features of verb-noun candidate bigrams and compares them with the features of all the LFs taken into account. A candidate whose features are sufficiently similar to those of a specific LF is considered to be an instance of this LF. The semantic features of both the training material and the candidate bigrams are derived from the hyperonymy hierarchies provided by the EuroWordNet. In the experiments carried out to validate the approach, we achieved an average $f$-score of about 70%.