Chinese-Uyghur sentence alignment: an approach based on anchor sentences

  • Authors:
  • Samat Mamitimin;Min Hou

  • Affiliations:
  • Xinjiang University, Urumqi, China and Communication University of China, Beijing, China;Communication University of China, Beijing, China

  • Venue:
  • BUCC '09 Proceedings of the 2nd Workshop on Building and Using Comparable Corpora: from Parallel to Non-parallel Corpora
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper, which builds on previous studies on sentence alignment, introduces a sentence alignment method in which some sentences are used as "anchors" and a two step procedure is applied. In the first step, some lexical information such as proper names, technical terms, numbers and punctuation marks, location information and length information are used to generate anchor sentences that satisfy some conditions. In the second step, texts are divided into several segments by using the anchor sentences as boundaries, and then the sentences in each segment are aligned by using a length-based approach. By applying this segmentation technique, the method avoids complex computation and error spreading. Experimental results show that the precision of the method is 94.6% on the average for Chinese-Uyghur sentence alignment for multi-domain texts.