Self-training and co-training in biomedical word sense disambiguation

Authors:
Antonio Jimeno-Yepes;Alan R. Aronson
Affiliations:
National Library of Medicine, Rockville Pike, Bethesda, MD;National Library of Medicine, Rockville Pike, Bethesda, MD
Venue:
BioNLP '11 Proceedings of BioNLP 2011 Workshop
Year:
2011

Citing 2
Cited 1

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Bootstrapping POS taggers using unlabelled data

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4

Scaling up WSD with automatically generated examples

BioNLP '12 Proceedings of the 2012 Workshop on Biomedical Natural Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Word sense disambiguation (WSD) is an intermediate task within information retrieval and information extraction, attempting to select the proper sense of ambiguous words. Due to the scarcity of training data, semi-supervised learning, which profits from seed annotated examples and a large set of unlabeled data, are worth researching. We present preliminary results of two semi-supervised learning algorithms on biomedical word sense disambiguation. Both methods add relevant unlabeled examples to the training set, and optimal parameters are similar for each ambiguous word.