Disambiguation in the biomedical domain: The role of ambiguity type

Authors:
Mark Stevenson;Yikun Guo
Affiliations:
Natural Language Processing Group, Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello, Sheffield S1 4DP, United Kingdom;Natural Language Processing Group, Department of Computer Science, University of Sheffield, Regent Court, 211 Portobello, Sheffield S1 4DP, United Kingdom
Venue:
Journal of Biomedical Informatics
Year:
2010

Citing 29
Cited 0

Word sense disambiguation and information retrieval

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Disambiguating ambiguous biomedical terms in biomedical narrative text: an unsupervised method

Computers and Biomedical Research
Introduction to the special issue on word sense disambiguation: the state of the art

Computational Linguistics - Special issue on word sense disambiguation
Estimating upper and lower bounds on the performance of word-sense disambiguation programs

ACL '92 Proceedings of the 30th annual meeting on Association for Computational Linguistics
Information problems in molecular biology and bioinformatics: Research Articles

Journal of the American Society for Information Science and Technology - Bioinformatics
Semi-supervised Maximum Entropy based approach to acronym and abbreviation normalization in medical texts

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
A decision tree of bigrams is an accurate predictor of word sense

NAACL '01 Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies
Gene name ambiguity of eukaryotic nomenclatures

Bioinformatics
Resolving abbreviations to their senses in Medline

Bioinformatics
Word sense disambiguation by selecting the best semantic type based on Journal Descriptor Indexing: Preliminary experiment

Journal of the American Society for Information Science and Technology
Medstract: creating large-scale information servers for biomedical libraries

BioMed '02 Proceedings of the ACL-02 workshop on Natural language processing in the biomedical domain - Volume 3
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
A large scale, corpus-based approach for automatically disambiguating biomedical abbreviations

ACM Transactions on Information Systems (TOIS)
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Finding predominant word senses in untagged text

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Using MEDLINE as a knowledge source for disambiguating abbreviations and acronyms in full-text biomedical journal articles

Journal of Biomedical Informatics
Gene symbol disambiguation using knowledge-based profiles

Bioinformatics
Word sense disambiguation: A survey

ACM Computing Surveys (CSUR)
Word sense disambiguation across two domains: Biomedical literature and clinical notes

Journal of Biomedical Informatics
Inter-coder agreement for computational linguistics

Computational Linguistics
An unsupervised vector approach to biomedical term disambiguation: integrating UMLS and Medline

HLT-SRWS '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Student Research Workshop
Species disambiguation for biomedical term identification

BioNLP '08 Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing
Estimating and exploiting the entropy of sense distributions

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Proceedings of the 4th International Workshop on Semantic Evaluations

SemEval '07 Proceedings of the 4th International Workshop on Semantic Evaluations
Disambiguating the species of biomedical named entities using natural language parsers

Bioinformatics
Disambiguation of ambiguous biomedical terms using examples generated from the UMLS Metathesaurus

Journal of Biomedical Informatics
The effect of ambiguity on the automated acquisition of WSD examples

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Word Sense Disambiguation (WSD), the automatic identification of the meanings of ambiguous terms in a document, is an important stage in text processing. We describe a WSD system that has been developed specifically for the types of ambiguities found in biomedical documents. This system uses a range of knowledge sources. It employs both linguistic features, such as local collocations, and features derived from domain-specific knowledge sources, the Unified Medical Language System (UMLS) and Medical Subject Headings (MeSH). This system is applied to three types of ambiguities found in Medline abstracts: ambiguous terms, abbreviations with multiple expansions and names that are ambiguous between genes. The WSD system is applied to the standard NLM-WSD data set, which consists of ambiguous terms from Medline abstracts, and was found to perform well in comparison with previously reported results. The system's performance and the contribution of each knowledge source depends upon the type of lexical ambiguity. 87.9% of the ambiguous terms are correctly disambiguated using a combination of linguistic features and MeSH terms, 99% of abbreviations are disambiguated by combining all knowledge sources, while 97.2% of ambiguous gene names are disambiguated using the MeSH terms alone. Analysis reveals that these differences are caused by the nature of each ambiguity type. These results should be taken into account when deciding which information to use for WSD and the level of performance that can be expected.