Building a lexical knowledge-base of near-synonym differences

  • Authors:
  • Graeme Hirst;Diana Inkpen

  • Affiliations:
  • -;-

  • Venue:
  • Building a lexical knowledge-base of near-synonym differences
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Current natural language generation or machine translation systems cannot distinguish among near-synonyms—words that share the same core meaning but vary in their lexical nuances. This is due to a lack of knowledge about differences between near-synonyms in existing computational lexical resources. The goal of this thesis is to automatically acquire a lexical knowledge-base of near-synonym differences (LKB of NS) from multiple sources, and to show how it can be used in a practical natural language processing system. I designed a method to automatically acquire knowledge from dictionaries of near-synonym discrimination written for human readers. An unsupervised decision-list algorithm learns patterns and words for classes of distinctions. The patterns are learned automatically, followed by a manual validation step. The extraction of distinctions between near-synonyms is entirely automatic. The main types of distinctions are: stylistic (for example, inebriated is more formal than drunk), attitudinal (for example, skinny is more pejorative than slim), and denotational (for example, blunder implies accident and ignorance, while error does not). I enriched the initial LKB of NS with information extracted from other sources. First, information about the senses of the near-synonym was added (WordNet senses). Second, knowledge about the collocational behaviour of the near-synonyms was acquired from free text. Collocations between a word and the near-synonyms in a dictionary entry were classified into: preferred collocations, less-preferred collocations, and anti-collocations. Third, knowledge about distinctions between near-synonyms was acquired from machine-readable dictionaries (the General Inquirer and the Macquarie Dictionary). These distinctions were merged with the initial LKB of NS, and inconsistencies were resolved. The generic LKB of NS needs to be customized in order to be used in a natural language processing system. The parts that need customization are the core denotations and the strings that describe peripheral concepts in the denotational distinctions. To show how the LKB of NS can be used in practice, I present Xenon, a natural language generation system that chooses the near-synonym that best matches a set of input preferences. I implemented Xenon by adding a near-synonym choice module and a near-synonym collocation module to an existing general-purpose surface realizer.