Low-cost enrichment of Spanish WordNet with automatically translated glosses: combining general and specialized models

  • Authors:
  • Jesús Giménez;Lluís Màrquez

  • Affiliations:
  • Universitat Politècnica de Catalunya, Barcelona;Universitat Politècnica de Catalunya, Barcelona

  • Venue:
  • COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper studies the enrichment of Spanish WordNet with synset glosses automatically obtained from the English Word-Net glosses using a phrase-based Statistical Machine Translation system. We construct the English-Spanish translation system from a parallel corpus of proceedings of the European Parliament, and study how to adapt statistical models to the domain of dictionary definitions. We build specialized language and translation models from a small set of parallel definitions and experiment with robust manners to combine them. A statistically significant increase in performance is obtained. The best system is finally used to generate a definition for all Spanish synsets, which are currently ready for a manual revision. As a complementary issue, we analyze the impact of the amount of in-domain data needed to improve a system trained entirely on out-of-domain data.