Extraction of bilingual cognates from wikipedia

  • Authors:
  • Pablo Gamallo;Marcos Garcia

  • Affiliations:
  • Centro de Investigação em Tecnologias da Informação (CITIUS), Universidade de Santiago de Compostela, Galiza, Spain;Centro de Investigação em Tecnologias da Informação (CITIUS), Universidade de Santiago de Compostela, Galiza, Spain

  • Venue:
  • PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this article, we propose a method to extract translation equivalents with similar spelling from comparable corpora. The method was applied on Wikipedia to extract a large amount of Portuguese-Spanish bilingual terminological pairs that were not found in existing dictionaries. The resulting bilingual lexicons consists of more than 27,000 new pairs of lemmas and multiwords, with about 92% accuracy.