A practical solution to the problem of automatic word sense induction

Authors:
Reinhard Rapp
Affiliations:
University of Mainz, Germersheim, Germany
Venue:
ACLdemo '04 Proceedings of the ACL 2004 on Interactive poster and demonstration sessions
Year:
2004

Citing 3
Cited 1

Discovering word senses from text

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
The computation of word associations: comparing syntagmatic and paradigmatic approaches

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1

A practical solution to the problem of automatic part-of-speech induction from text

ACLdemo '05 Proceedings of the ACL 2005 on Interactive poster and demonstration sessions

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent studies in word sense induction are based on clustering global co-occurrence vectors, i.e. vectors that reflect the overall behavior of a word in a corpus. If a word is semantically ambiguous, this means that these vectors are mixtures of all its senses. Inducing a word's senses therefore involves the difficult problem of recovering the sense vectors from the mixtures. In this paper we argue that the demixing problem can be avoided since the contextual behavior of the senses is directly observable in the form of the local contexts of a word. From human disambiguation performance we know that the context of a word is usually sufficient to determine its sense. Based on this observation we describe an algorithm that discovers the different senses of an ambiguous word by clustering its contexts. The main difficulty with this approach, namely the problem of data sparseness, could be minimized by looking at only the three main dimensions of the context matrices.