Automatic generation of bilingual dictionaries using intermediary languages and comparable corpora

  • Authors:
  • Pablo Gamallo Otero;José Ramom Pichel Campos

  • Affiliations:
  • Departamento de Língua Espanhola, Universidade de Santiago de Compostela, Galiza, Spain;Departamento de Tecnologia Linguística da Imaxin|Software, Santiago de Compostela, Galiza

  • Venue:
  • CICLing'10 Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper outlines a strategy to build new bilingual dictionaries from existing resources. The method is based on two main tasks: first, a new set of bilingual correspondences is generated from two available bilingual dictionaries. Second, the generated correspondences are validated by making use of a bilingual lexicon automatically extracted from non-parallel, and comparable corpora. The quality of the entries of the derived dictionary is very high, similar to that of hand-crafted dictionaries. We report a case study where a new, non noisy, English-Galician dictionary with about 12,000 correct bilingual correspondences was automatically generated.