Unsupervised word sense disambiguation with N-gram features

Authors:
Daniel Preotiuc-Pietro;Florentina Hristea
Affiliations:
Department of Computer Science, University of Sheffield, Sheffield, UK S1 4DP;Department of Computer Science, University of Bucharest, Bucharest, Romania 010014
Venue:
Artificial Intelligence Review
Year:
2014

Citing 15
Cited 0

WordNet: a lexical database for English

Communications of the ACM
Knowledge lean word-sense disambiguation

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Using the web to obtain frequencies for unseen bigrams

Computational Linguistics - Special issue on web as corpus
Automatic word sense discrimination

Computational Linguistics - Special issue on word sense disambiguation
Corpus-based statistical sense resolution

HLT '93 Proceedings of the workshop on Human Language Technology
Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology)

Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology)
Adjective Sense Disambiguation at the Border Between Unsupervised and Knowledge-Based Techniques

Fundamenta Informaticae
Extended gloss overlaps as a measure of semantic relatedness

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Web-scale N-gram models for lexical disambiguation

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Performing word sense disambiguation at the border between unsupervised and knowledge-based techniques

Artificial Intelligence Review
Real-word spelling correction using Google Web IT 3-grams

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3
Linguistic steganography using automatically generated paraphrases

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Creating robust supervised classifiers via web-scale N-gram data

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Knowledge-rich Word Sense Disambiguation rivaling supervised systems

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

Quantified Score

Hi-index	0.00

Visualization

Abstract

The present paper concentrates on the issue of feature selection for unsupervised word sense disambiguation (WSD) performed with an underlying Naïve Bayes model. It introduces web N-gram features which, to our knowledge, are used for the first time in unsupervised WSD. While creating features from unlabeled data, we are "helping" a simple, basic knowledge-lean disambiguation algorithm to significantly increase its accuracy as a result of receiving easily obtainable knowledge. The performance of this method is compared to that of others that rely on completely different feature sets. Test results concerning nouns, adjectives and verbs show that web N-gram feature selection is a reliable alternative to previously existing approaches, provided that a "quality list" of features, adapted to the part of speech, is used.