Determining the difficulty of Word Sense Disambiguation

Authors:
Bridget T. Mcinnes;Mark Stevenson
Affiliations:
-;-
Venue:
Journal of Biomedical Informatics
Year:
2014

Citing 19
Cited 0

Elements of information theory

Elements of information theory
Word sense disambiguation and information retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone

SIGDOC '86 Proceedings of the 5th annual international conference on Systems documentation
Using concepts in literature-based discovery: simulating Swanson's Raynaud-fish oil and migraine-magnesium discoveries

Journal of the American Society for Information Science and Technology
Disambiguating ambiguous biomedical terms in biomedical narrative text: an unsupervised method

Computers and Biomedical Research
An Information-Theoretic Definition of Similarity

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Verbs semantics and lexical selection

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Towards the development of a conceptual distance metric for the UMLS

Journal of Biomedical Informatics
Word sense disambiguation by selecting the best semantic type based on Journal Descriptor Indexing: Preliminary experiment

Journal of the American Society for Information Science and Technology
Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology)

Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology)
Finding predominant word senses in untagged text

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Inter-coder agreement for computational linguistics

Computational Linguistics
Disambiguation of biomedical abbreviations

BioNLP '09 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Estimating and exploiting the entropy of sense distributions

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Using information content to evaluate semantic similarity in a taxonomy

IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1
Extended gloss overlaps as a measure of semantic relatedness

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
The effect of ambiguity on the automated acquisition of WSD examples

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Graph-based Word Sense Disambiguation of biomedical documents

Bioinformatics
Using second-order vectors in a knowledge-based method for acronym disambiguation

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic processing of biomedical documents is made difficult by the fact that many of the terms they contain are ambiguous. Word Sense Disambiguation (WSD) systems attempt to resolve these ambiguities and identify the correct meaning. However, the published literature on WSD systems for biomedical documents report considerable differences in performance for different terms. The development of WSD systems is often expensive with respect to acquiring the necessary training data. It would therefore be useful to be able to predict in advance which terms WSD systems are likely to perform well or badly on. This paper explores various methods for estimating the performance of WSD systems on a wide range of ambiguous biomedical terms (including ambiguous words/phrases and abbreviations). The methods include both supervised and unsupervised approaches. The supervised approaches make use of information from labeled training data while the unsupervised ones rely on the UMLS Metathesaurus. The approaches are evaluated by comparing their predictions about how difficult disambiguation will be for ambiguous terms against the output of two WSD systems. We find the supervised methods are the best predictors of WSD difficulty, but are limited by their dependence on labeled training data. The unsupervised methods all perform well in some situations and can be applied more widely.