Co-occurrence cluster features for lexical substitutions in context

Authors:
Chris Biemann
Affiliations:
Powerset (a Microsoft company), San Francisco, CA
Venue:
TextGraphs-5 Proceedings of the 2010 Workshop on Graph-based Methods for Natural Language Processing
Year:
2010

Citing 9
Cited 1

Not So Naive Bayes: Aggregating One-Dependence Estimators

Machine Learning
A graph model for unsupervised lexical acquisition

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology)

Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology)
OntoNotes: the 90% solution

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
SemEval-2007 task 17: English lexical sample, SRL and all words

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
On the use of automatically acquired examples for all-nouns word sense disambiguation

Journal of Artificial Intelligence Research
Chinese whispers: an efficient graph clustering algorithm and its application to natural language processing problems

TextGraphs-1 Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing
Evaluating and optimizing the parameters of an unsupervised graph-based WSD algorithm

TextGraphs-1 Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter

Creating a system for lexical substitutions from scratch using crowdsourcing

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper examines the influence of features based on clusters of co-occurrences for supervised Word Sense Disambiguation and Lexical Substitution. Co-occurrence cluster features are derived from clustering the local neighborhood of a target word in a co-occurrence graph based on a corpus in a completely unsupervised fashion. Clusters can be assigned in context and are used as features in a supervised WSD system. Experiments fitting a strong baseline system with these additional features are conducted on two datasets, showing improvements. Co-occurrence features are a simple way to mimic Topic Signatures (Martínez et al., 2008) without needing to construct resources manually. Further, a system is described that produces lexical substitutions in context with very high precision.