The effect of different context representations on word sense discrimination in biomedical texts

Authors:
Ted Pedersen
Affiliations:
University of Minnesota, Duluth, Duluth, MN, USA
Venue:
Proceedings of the 1st ACM International Health Informatics Symposium
Year:
2010

Citing 10
Cited 2

Clustering Algorithms

Clustering Algorithms
Automatic word sense discrimination

Computational Linguistics - Special issue on word sense disambiguation
Category-based pseudowords

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Automatic cluster stopping with criterion functions and the gap statistic

NAACL-Demonstrations '06 Proceedings of the 2006 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume: demonstrations
Methodological Review: Empirical distributional semantics: Methods and biomedical applications

Journal of Biomedical Informatics
Selecting the "right" number of senses based on clustering criterion functions

EACL '06 Proceedings of the Eleventh Conference of the European Chapter of the Association for Computational Linguistics: Posters & Demonstrations
Evaluation of utility of LSA for word sense discrimination

NAACL-Short '06 Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers
How latent is latent semantic analysis?

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Significant lexical relationships

AAAI'96 Proceedings of the thirteenth national conference on Artificial intelligence - Volume 1
Name discrimination by clustering similar contexts

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing

Using second-order vectors in a knowledge-based method for acronym disambiguation

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
Evaluating measures of semantic similarity and relatedness to disambiguate terms in biomedical text

Journal of Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Unsupervised word sense discrimination relies on the idea that words that occur in similar contexts will have similar meanings. These techniques cluster multiple contexts in which an ambiguous word occurs, and the number of clusters discovered indicates the number of senses in which the ambiguous word is used. One important distinction among these methods is the underlying means of representing the contexts to be clustered. This paper compares the efficacy of first-order methods that directly represent the features that occur in a context with several second-order methods that use a more indirect representation. The experiments in this paper show that second order methods that use word by word co-occurrence matrices result in the highest accuracy and most robust word sense discrimination. These experiments were conducted on MedLine abstracts that contained pseudo--words created by conflating together pairs of MeSH preferred terms to create new ambiguous words. The experiments were carried out with SenseClusters, a freely available open source software package.