Word sense disambiguation for untagged corpus: application to Romanian language

Authors:
Gabriela Şerban;Doina Tătar
Affiliations:
Department of Computer Science, University "Babeş-Bolyai", Cluj-Napoca, Romania;Department of Computer Science, University "Babeş-Bolyai", Cluj-Napoca, Romania
Venue:
CICLing'03 Proceedings of the 4th international conference on Computational linguistics and intelligent text processing
Year:
2003

Citing 6
Cited 0

Combining labeled and unlabeled data with co-training

COLT' 98 Proceedings of the eleventh annual conference on Computational learning theory
Foundations of statistical natural language processing

Foundations of statistical natural language processing
An Iterative Approach to Word Sense Disambiguation

Proceedings of the Thirteenth International Florida Artificial Intelligence Research Society Conference
Distinguishing systems and distinguishing senses: new evaluation methods for Word Sense Disambiguation

Natural Language Engineering
Unsupervised word sense disambiguation rivaling supervised methods

ACL '95 Proceedings of the 33rd annual meeting on Association for Computational Linguistics
A method for word sense disambiguation of unrestricted text

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

The task of disambiguation is to determine which of the senses of an ambiguous word is invoked in a particular use of the word [5,8]. It is known that the statistical methods produce high accuracy results for semantically tagged corpora [2]. Also, Word Net is a good source of information for WSD [3,4]. Since for Romanian language does not exist neither a corpus nor something similar with WordNet, we propose an algorithm for WSD which requires only information that can be extracted from untagged corpus. Our algorithm preserves the advantage of principles of Yarowsky [9,7,10] and adds the known high performance of a NBC algorithms. It learns to make predictions based on local context with only a few labeled contexts and many unlabeled ones.