Low-cost enrichment of Spanish WordNet with automatically translated glosses: combining general and specialized models

Authors:
Jesús Giménez;Lluís Màrquez
Affiliations:
Universitat Politècnica de Catalunya, Barcelona;Universitat Politècnica de Catalunya, Barcelona
Venue:
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Year:
2006

Citing 6
Cited 1

An automatic method for generating sense tagged corpora

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Numerical Recipes in C++: the art of scientific computing

Numerical Recipes in C++: the art of scientific computing
A systematic comparison of various statistical alignment models

Computational Linguistics
A statistical approach to language translation

COLING '88 Proceedings of the 12th conference on Computational linguistics - Volume 1
Precision and recall of machine translation

NAACL-Short '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology: companion volume of the Proceedings of HLT-NAACL 2003--short papers - Volume 2
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics

HLT '02 Proceedings of the second international conference on Human Language Technology Research

Enriching Statistical Translation Models Using a Domain-Independent Multilingual Lexical Knowledge Base

CICLing '09 Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper studies the enrichment of Spanish WordNet with synset glosses automatically obtained from the English Word-Net glosses using a phrase-based Statistical Machine Translation system. We construct the English-Spanish translation system from a parallel corpus of proceedings of the European Parliament, and study how to adapt statistical models to the domain of dictionary definitions. We build specialized language and translation models from a small set of parallel definitions and experiment with robust manners to combine them. A statistically significant increase in performance is obtained. The best system is finally used to generate a definition for all Spanish synsets, which are currently ready for a manual revision. As a complementary issue, we analyze the impact of the amount of in-domain data needed to improve a system trained entirely on out-of-domain data.