Building Bilingual Parallel Corpora Based on Wikipedia

Authors:
Mehdi Mohammadi;Nasser GhasemAghaee
Affiliations:
-;-
Venue:
ICCEA '10 Proceedings of the 2010 Second International Conference on Computer Engineering and Applications - Volume 02
Year:
2010

Citing 0
Cited 2

TEP: Tehran English-Persian parallel corpus

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Towards building a multilingual semantic network: identifying interlingual links in Wikipedia

SemEval '12 Proceedings of the First Joint Conference on Lexical and Computational Semantics - Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Aligned parallel corpora are an important resource for a wide range of multilingual researches, specifically, corpus-based machine translation. In this paper we present a Persian- English sentence-aligned parallel corpus by mining Wikipedia. We propose a method of extracting sentence-level alignment by using an extended link-based bilingual lexicon method. Experimental results show that our method increase precision, while it reduce the total number of generated candidate pairs.