Improving the use of pseudo-words for evaluating selectional preferences

Authors:
Nathanael Chambers;Dan Jurafsky
Affiliations:
Stanford University;Stanford University
Venue:
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Year:
2010

Citing 10
Cited 3

Similarity-Based Models of Word Cooccurrence Probabilities

Machine Learning - Special issue on natural language learning
Using the web to obtain frequencies for unseen bigrams

Computational Linguistics - Special issue on web as corpus
Determinants of adjective-noun plausibility

EACL '99 Proceedings of the ninth conference on European chapter of the Association for Computational Linguistics
The Berkeley FrameNet Project

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
More accurate tests for the statistical significance of result differences

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 2
Inducing a semantically annotated lexicon via EM-based clustering

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Category-based pseudowords

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Discriminative learning of selectional preference from unlabeled text

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Generalizing over lexical features: selectional preferences for semantic role classification

ACLShort '09 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

Semantic relations in bilingual lexicons

ACM Transactions on Speech and Language Processing (TSLP)
Measuring the impact of sense similarity on word sense induction

EMNLP '11 Proceedings of the First Workshop on Unsupervised Learning in NLP
Sketch algorithms for estimating point queries in NLP

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper improves the use of pseudo-words as an evaluation framework for selectional preferences. While pseudo-words originally evaluated word sense disambiguation, they are now commonly used to evaluate selectional preferences. A selectional preference model ranks a set of possible arguments for a verb by their semantic fit to the verb. Pseudo-words serve as a proxy evaluation for these decisions. The evaluation takes an argument of a verb like drive (e.g. car), pairs it with an alternative word (e.g. car/rock), and asks a model to identify the original. This paper studies two main aspects of pseudoword creation that affect performance results. (1) Pseudo-word evaluations often evaluate only a subset of the words. We show that selectional preferences should instead be evaluated on the data in its entirety. (2) Different approaches to selecting partner words can produce overly optimistic evaluations. We offer suggestions to address these factors and present a simple baseline that outperforms the state-of-the-art by 13% absolute on a newspaper domain.