Estimating semantic distance using soft semantic constraints in knowledge-source-corpus hybrid models

  • Authors:
  • Yuval Marton;Saif Mohammad;Philip Resnik

  • Affiliations:
  • University of Maryland, College Park, MD;University of Maryland, College Park, MD;University of Maryland, College Park, MD

  • Venue:
  • EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Strictly corpus-based measures of semantic distance conflate co-occurrence information pertaining to the many possible senses of target words. We propose a corpus-thesaurus hybrid method that uses soft constraints to generate word-senseaware distributional profiles (DPs) from coarser "concept DPs" (derived from a Roget-like thesaurus) and sense-unaware traditional word DPs (derived from raw text). Although it uses a knowledge source, the method is not vocabulary-limited: if the target word is not in the thesaurus, the method falls back gracefully on the word's co-occurrence information. This allows the method to access valuable information encoded in a lexical resource, such as a thesaurus, while still being able to effectively handle domain-specific terms and named entities. Experiments on word-pair ranking by semantic distance show the new hybrid method to be superior to others.