Parallel corpora for wordnet construction: machine translation vs. automatic sense tagging

Authors:
Antoni Oliver;Salvador Climent
Affiliations:
Universitat Oberta de Catalunya, Barcelona, Spain;Universitat Oberta de Catalunya, Barcelona, Spain
Venue:
CICLing'12 Proceedings of the 13th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part II
Year:
2012

Citing 6
Cited 0

A semantic concordance

HLT '93 Proceedings of the workshop on Human Language Technology
Sense discrimination with parallel corpora

WSD '02 Proceedings of the ACL-02 workshop on Word sense disambiguation: recent successes and future directions - Volume 8
Alignment by agreement

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
The Google Similarity Distance

IEEE Transactions on Knowledge and Data Engineering
BabelNet: building a very large multilingual semantic network

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Unsupervised construction of a multilingual WordNet from parallel corpora

MCTLLL '09 Proceedings of the Workshop on Natural Language Processing Methods and Corpora in Translation, Lexicography, and Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we present a methodology for WordNet construction based on the exploitation of parallel corpora with semantic annotation of the English source text. We are using this methodology for the enlargement of the Spanish and Catalan versions of WordNet 3.0, but the methodology can also be used for other languages. As big parallel corpora with semantic annotation are not usually available, we explore two strategies to overcome this problem: to use monolingual sense tagged corpora and machine translation, on the one hand; and to use parallel corpora and automatic sense tagging on the source text, on the other. With these resources, the problem of acquiring a WordNet from parallel corpora can be seen as a word alignment task. Fortunately, this task is well known, and some aligning algorithms are freely available.