Extraction of bilingual cognates from wikipedia

Authors:
Pablo Gamallo;Marcos Garcia
Affiliations:
Centro de Investigação em Tecnologias da Informação (CITIUS), Universidade de Santiago de Compostela, Galiza, Spain;Centro de Investigação em Tecnologias da Informação (CITIUS), Universidade de Santiago de Compostela, Galiza, Spain
Venue:
PROPOR'12 Proceedings of the 10th international conference on Computational Processing of the Portuguese Language
Year:
2012

Citing 12
Cited 0

Identifying word correspondence in parallel texts

HLT '91 Proceedings of the workshop on Speech and Natural Language
A portable algorithm for mapping bitext correspondence

ACL '98 Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics
A simple hybrid aligner for generating lexical correspondences in parallel texts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
An IR approach for translating new words from nonparallel, comparable texts

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 1
Automatic identification of word translations from unrelated English and German corpora

ACL '99 Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics
Looking for candidate translational equivalents in specialized, comparable corpora

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 2
Mining new word translations from comparable corpora

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Extracting bilingual dictionary from comparable corpora with dependency heterogeneity

NAACL-Short '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers
Learning Spanish-Galician translation equivalents using a comparable corpus and a bilingual dictionary

CICLing'08 Proceedings of the 9th international conference on Computational linguistics and intelligent text processing
A multi-view approach for term translation spotting

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Measuring spelling similarity for cognate identification

EPIA'11 Proceedings of the 15th Portugese conference on Progress in artificial intelligence
Open-Source portuguese–spanish machine translation

PROPOR'06 Proceedings of the 7th international conference on Computational Processing of the Portuguese Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this article, we propose a method to extract translation equivalents with similar spelling from comparable corpora. The method was applied on Wikipedia to extract a large amount of Portuguese-Spanish bilingual terminological pairs that were not found in existing dictionaries. The resulting bilingual lexicons consists of more than 27,000 new pairs of lemmas and multiwords, with about 92% accuracy.