Scaling up WSD with automatically generated examples

Authors:
Weiwei Cheng;Judita Preiss;Mark Stevenson
Affiliations:
Sheffield University, Regent Court, Portobello, Sheffield, United Kingdom;Sheffield University, Regent Court, Portobello, Sheffield, United Kingdom;Sheffield University, Regent Court, Portobello, Sheffield, United Kingdom
Venue:
BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing
Year:
2012

Citing 10
Cited 0

Extracting Patterns and Relations from the World Wide Web

WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Using corpus statistics and WordNet relations for sense identification

Computational Linguistics - Special issue on word sense disambiguation
Word sense disambiguation by selecting the best semantic type based on Journal Descriptor Indexing: Preliminary experiment

Journal of the American Society for Information Science and Technology
Finding predominant word senses in untagged text

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Gene symbol disambiguation using knowledge-based profiles

Bioinformatics
Word sense disambiguation across two domains: Biomedical literature and clinical notes

Journal of Biomedical Informatics
Inter-coder agreement for computational linguistics

Computational Linguistics
Disambiguation of ambiguous biomedical terms using examples generated from the UMLS Metathesaurus

Journal of Biomedical Informatics
Graph-based Word Sense Disambiguation of biomedical documents

Bioinformatics
Self-training and co-training in biomedical word sense disambiguation

BioNLP '11 Proceedings of BioNLP 2011 Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

The most accurate approaches to Word Sense Disambiguation (WSD) for biomedical documents are based on supervised learning. However, these require manually labeled training examples which are expensive to create and consequently supervised WSD systems are normally limited to disambiguating a small set of ambiguous terms. An alternative approach is to create labeled training examples automatically and use them as a substitute for manually labeled ones. This paper describes a large scale WSD system based on automatically labeled examples generated using information from the UMLS Metathesaurus. The labeled examples are generated without any use of labeled training data whatsoever and is therefore completely unsupervised (unlike some previous approaches). The system is evaluated on two widely used data sets and found to outperform a state-of-the-art unsupervised approach which also uses information from the UMLS Metathesaurus.