Mining the Web to Discover the Meanings of an Ambiguous Word

Authors:
Raz Tamir;Reinhard Rapp
Affiliations:
-;-
Venue:
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Year:
2003

Citing 3
Cited 1

Foundations of statistical natural language processing

Foundations of statistical natural language processing
Discovering word senses from text

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics

A Random Walk through Human Associations

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

In information retrieval and text mining, informationon word senses is usually taken from dictionaries or lexicaldatabases that have been prepared by lexicographers.In this paper we propose an automatic method for wordsense induction, i.e. for the discovery of a set of sensedescriptors to a given ambiguous word. The approach isbased on the statistics of word co-occurrence as derivedfrom web pages. The underlying assumption is that thesenses of an ambiguous word are best described by termsthat, although bearing a strong association to this word,are mutually exclusive, i.e. whose association strengthwithin the retrieved web pages is as weak as possible.Measuring association strength is based upon a novelConfidence Gain approach that relates the observed co-occurrencefrequency for two sense descriptor candidatesto an average co-occurrence frequency for pairs of arbitrarywords. The proposed approach is fully unsupervisedand takes into account the contemporary meanings ofwords, as reflected in texts from the internet. Our resultsare evaluated using a list of ambiguous words commonlyreferred to in the literature.