Estimating semantic distance using soft semantic constraints in knowledge-source-corpus hybrid models

Authors:
Yuval Marton;Saif Mohammad;Philip Resnik
Affiliations:
University of Maryland, College Park, MD;University of Maryland, College Park, MD;University of Maryland, College Park, MD
Venue:
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Year:
2009

Citing 14
Cited 4

Similarity-Based Models of Word Cooccurrence Probabilities

Machine Learning - Special issue on natural language learning
Contextual correlates of synonymy

Communications of the ACM
Placing search in context: the concept revisited

ACM Transactions on Information Systems (TOIS)
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Accurate methods for the statistics of surprise and coincidence

Computational Linguistics - Special issue on using large corpora: I
Correcting real-word spelling errors by restoring lexical cohesion

Natural Language Engineering
Meaningful clustering of senses helps boost word sense disambiguation performance

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Characterising measures of lexical distributional similarity

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Exploiting semantic role labeling, WordNet and Wikipedia for coreference resolution

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Non-classical lexical semantic relations

CLS '04 Proceedings of the HLT-NAACL Workshop on Computational Lexical Semantics
Distributional measures of concept-distance: a task-oriented evaluation

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
A structured vector space model for word meaning in context

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Extended gloss overlaps as a measure of semantic relatedness

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Measuring semantic distance using distributional profiles of concepts

Measuring semantic distance using distributional profiles of concepts

Improved statistical machine translation using monolingually-derived paraphrases

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1 - Volume 1
A topological embedding of the lexicon for semantic distance computation

Natural Language Engineering
Visual information in semantic representation

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Distributional phrasal paraphrase generation for statistical machine translation

ACM Transactions on Intelligent Systems and Technology (TIST) - Special Sections on Paraphrasing; Intelligent Systems for Socially Aware Computing; Social Computing, Behavioral-Cultural Modeling, and Prediction

Quantified Score

Hi-index	0.00

Visualization

Abstract

Strictly corpus-based measures of semantic distance conflate co-occurrence information pertaining to the many possible senses of target words. We propose a corpus-thesaurus hybrid method that uses soft constraints to generate word-senseaware distributional profiles (DPs) from coarser "concept DPs" (derived from a Roget-like thesaurus) and sense-unaware traditional word DPs (derived from raw text). Although it uses a knowledge source, the method is not vocabulary-limited: if the target word is not in the thesaurus, the method falls back gracefully on the word's co-occurrence information. This allows the method to access valuable information encoded in a lexical resource, such as a thesaurus, while still being able to effectively handle domain-specific terms and named entities. Experiments on word-pair ranking by semantic distance show the new hybrid method to be superior to others.