Foundations of statistical natural language processing
Foundations of statistical natural language processing
Discovering word senses from text
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Unsupervised word sense disambiguation rivaling supervised methods
ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
A Random Walk through Human Associations
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Hi-index | 0.00 |
In information retrieval and text mining, informationon word senses is usually taken from dictionaries or lexicaldatabases that have been prepared by lexicographers.In this paper we propose an automatic method for wordsense induction, i.e. for the discovery of a set of sensedescriptors to a given ambiguous word. The approach isbased on the statistics of word co-occurrence as derivedfrom web pages. The underlying assumption is that thesenses of an ambiguous word are best described by termsthat, although bearing a strong association to this word,are mutually exclusive, i.e. whose association strengthwithin the retrieved web pages is as weak as possible.Measuring association strength is based upon a novelConfidence Gain approach that relates the observed co-occurrencefrequency for two sense descriptor candidatesto an average co-occurrence frequency for pairs of arbitrarywords. The proposed approach is fully unsupervisedand takes into account the contemporary meanings ofwords, as reflected in texts from the internet. Our resultsare evaluated using a list of ambiguous words commonlyreferred to in the literature.