Cross-lingual word sense disambiguation for languages with scarce resources

Authors:
Bahareh Sarrafzadeh;Nikolay Yakovets;Nick Cercone;Aijun An
Affiliations:
Department of Computer Science and Engineering, York University, Canada;Department of Computer Science and Engineering, York University, Canada;Department of Computer Science and Engineering, York University, Canada;Department of Computer Science and Engineering, York University, Canada
Venue:
Canadian AI'11 Proceedings of the 24th Canadian conference on Advances in artificial intelligence
Year:
2011

Citing 10
Cited 1

A statistical approach to sense disambiguation in machine translation

HLT '91 Proceedings of the workshop on Speech and Natural Language
Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone

SIGDOC '86 Proceedings of the 5th annual international conference on Systems documentation
Distinguishing systems and distinguishing senses: new evaluation methods for Word Sense Disambiguation

Natural Language Engineering
An unsupervised method for word sense tagging using parallel corpora

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Fine-grained word sense disambiguation based on parallel corpora, word alignment, word clustering and aligned wordnets

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Ontology-Supported Text Classification Based on Cross-Lingual Word Sense Disambiguation

WILF '07 Proceedings of the 7th international workshop on Fuzzy Logic and Applications: Applications of Fuzzy Sets Theory
Word Sense Disambiguation of Farsi Homographs Using Thesaurus and Corpus

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Word sense disambiguation: A survey

ACM Computing Surveys (CSUR)
WordNet::SenseRelate::AllWords: a broad coverage word sense tagger that maximizes semantic relatedness

NAACL-Demonstrations '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Demonstration Session
Extended gloss overlaps as a measure of semantic relatedness

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence

Towards automatic acquisition of a fully sense tagged corpus for persian

ISMIS'11 Proceedings of the 19th international conference on Foundations of intelligent systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Word Sense Disambiguation has long been a central problem in computational linguistics. Word Sense Disambiguation is the ability to identify the meaning of words in context in a computational manner. Statistical and supervised approaches require a large amount of labeled resources as training datasets. In contradistinction to English, the Persian language has neither any semantically tagged corpus to aid machine learning approaches for Persian texts, nor any suitable parallel corpora. Yet due to the ever-increasing development of Persian pages in Wikipedia, this resource can act as a comparable corpus for English-Persian texts. In this paper, we propose a cross-lingual approach to tagging the word senses in Persian texts. The new approach makes use of English sense disambiguators, the Wikipedia articles in both English and Persian, and a newly developed lexical ontology, FarsNet. It overcomes the lack of knowledge resources and NLP tools for the Persian language. We demonstrate the effectiveness of the proposed approach by comparing it to a direct sense disambiguation approach for Persian. The evaluation results indicate a comparable performance to the utilized English sense tagger.