Exploring extensive linguistic feature sets in near-synonym lexical choice

Authors:
Mari-Sanna Paukkeri;Jaakko Väyrynen;Antti Arppe
Affiliations:
Aalto University School of Science, Aalto, Finland;Aalto University School of Science, Aalto, Finland;University of Helsinki, Finland
Venue:
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Year:
2012

Citing 17
Cited 0

Dimensions of meaning

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Query expansion using lexical-semantic relations

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Independent component analysis, a new concept?

Signal Processing - Special issue on higher order statistics
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
Self-Organizing Maps

Self-Organizing Maps
Near-synonymy and lexical choice

Computational Linguistics
An introduction to variable and feature selection

The Journal of Machine Learning Research
A non-projective dependency parser

ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Choosing the word most typical in context using a lexical co-occurrence network

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
Lexical substitution as a task for WSD evaluation

WSD '02 Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions - Volume 8
Building and Using a Lexical Knowledge Base of Near-Synonym Differences

Computational Linguistics
Data-driven semantic analysis for multilingual WSD and lexical selection in translation

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
SemEval-2007 task 10: English lexical substitution task

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Introduction to Semi-Supervised Learning

Introduction to Semi-Supervised Learning
SemEval-2010 task 2: Cross-lingual lexical substitution

SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
Near-synonym lexical choice in latent semantic space

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the near-synonym lexical choice task, the best alternative out of a set of near-synonyms is selected to fill a lexical gap in a text. We experiment on an approach of an extensive set, over 650, linguistic features to represent the context of a word, and a range of machine learning approaches in the lexical choice task. We extend previous work by experimenting with unsupervised and semi-supervised methods, and use automatic feature selection to cope with the problems arising from the rich feature set. It is natural to think that linguistic analysis of the word context would yield almost perfect performance in the task but we show that too many features, even linguistic, introduce noise and make the task difficult for unsupervised and semi-supervised methods. We also show that purely syntactic features play the biggest role in the performance, but also certain semantic and morphological features are needed.