Feature vector quality and distributional similarity

Authors:
Maayan Geffet;Ido Dagan
Affiliations:
Hebrew University, Jerusalem, Israel;Bar-Ilan University, Ramat-Gan, Israel
Venue:
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Year:
2004

Citing 12
Cited 12

Word association norms, mutual information, and lexicography

Computational Linguistics
Experiment on linguistically-based term associations

Information Processing and Management: an International Journal
Similarity-based approaches to natural language processing

Similarity-based approaches to natural language processing
Similarity-Based Models of Word Cooccurrence Probabilities

Machine Learning - Special issue on natural language learning
Explorations in Automatic Thesaurus Discovery

Explorations in Automatic Thesaurus Discovery
Discovery of inference rules for question-answering

Natural Language Engineering
Automatic retrieval and clustering of similar words

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
Principle-based parsing without overgeneration

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Noun classification from predicate-argument structures

ACL '90 Proceedings of the 28th annual meeting on Association for Computational Linguistics
Extracting paraphrases from a parallel corpus

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
A general framework for distributional similarity

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing

Co-occurrence Retrieval: A Flexible Framework for Lexical Distributional Similarity

Computational Linguistics
The distributional inclusion hypotheses and lexical entailment

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Integrating pattern-based and distributional similarity methods for lexical entailment acquisition

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
A supervised learning approach to automatic synonym identification based on distributional features

HLT-SRWS '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Student Research Workshop
A probabilistic model for measuring grammaticality and similarity of automatically generated paraphrases of predicate phrases

COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
Evaluating the inferential utility of lexical-semantic resources

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Finding word substitutions using a distributional similarity baseline and immediate context overlap

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop
Bootstrapping distributional feature vector quality

Computational Linguistics
Graph-based clustering for semantic classification of onomatopoetic words

TextGraphs-3 Proceedings of the 3rd Textgraphs Workshop on Graph-Based Algorithms for Natural Language Processing
The distributional similarity of sub-parses

EMSEE '05 Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment
Discriminative training of clustering functions: theory and experiments with entity identification

CONLL '05 Proceedings of the Ninth Conference on Computational Natural Language Learning
A lexical alignment model for probabilistic textual entailment

MLCW'05 Proceedings of the First international conference on Machine Learning Challenges: evaluating Predictive Uncertainty Visual Object Classification, and Recognizing Textual Entailment

Quantified Score

Hi-index	0.00

Visualization

Abstract

We suggest a new goal and evaluation criterion for word similarity measures. The new criterion - meaning-entailing substitutability - fits the needs of semantic-oriented NLP applications and can be evaluated directly (independent of an application) at a good level of human agreement. Motivated by this semantic criterion we analyze the empirical quality of distributional word feature vectors and its impact on word similarity results, proposing an objective measure for evaluating feature vector quality. Finally, a novel feature weighting and selection function is presented, which yields superior feature vectors and better word similarity performance.