Building Bilingual Parallel Corpora Based on Wikipedia

  • Authors:
  • Mehdi Mohammadi;Nasser GhasemAghaee

  • Affiliations:
  • -;-

  • Venue:
  • ICCEA '10 Proceedings of the 2010 Second International Conference on Computer Engineering and Applications - Volume 02
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Aligned parallel corpora are an important resource for a wide range of multilingual researches, specifically, corpus-based machine translation. In this paper we present a Persian- English sentence-aligned parallel corpus by mining Wikipedia. We propose a method of extracting sentence-level alignment by using an extended link-based bilingual lexicon method. Experimental results show that our method increase precision, while it reduce the total number of generated candidate pairs.