Mining the Web to Discover the Meanings of an Ambiguous Word

  • Authors:
  • Raz Tamir;Reinhard Rapp

  • Affiliations:
  • -;-

  • Venue:
  • ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

In information retrieval and text mining, informationon word senses is usually taken from dictionaries or lexicaldatabases that have been prepared by lexicographers.In this paper we propose an automatic method for wordsense induction, i.e. for the discovery of a set of sensedescriptors to a given ambiguous word. The approach isbased on the statistics of word co-occurrence as derivedfrom web pages. The underlying assumption is that thesenses of an ambiguous word are best described by termsthat, although bearing a strong association to this word,are mutually exclusive, i.e. whose association strengthwithin the retrieved web pages is as weak as possible.Measuring association strength is based upon a novelConfidence Gain approach that relates the observed co-occurrencefrequency for two sense descriptor candidatesto an average co-occurrence frequency for pairs of arbitrarywords. The proposed approach is fully unsupervisedand takes into account the contemporary meanings ofwords, as reflected in texts from the internet. Our resultsare evaluated using a list of ambiguous words commonlyreferred to in the literature.